"do" block and hard scope

yvikhlya · September 11, 2023, 12:33am

Could someone explain me why in this case:

y=0
map(1:10) do x
  y=x^2
end

outer y does not get modified, as stated in documentation, but in this case

gm=zeros(Float64, Nlev, Nt)
var=zeros(Union{Missing, Float32}, Nx, Ny, Nlev)
ave_dims=(1,2)
for (tt, ff) in enumerate(files)
        @show tt, ff
        # Seems like here we need to proceed file by file instead of using
        # a multi-file dataset
        NCDataset(ff) do ds
            var[:,:,:]=ds[varname][:,:,:,1]::Array{Union{Missing, Float32},3}
            gm[:,tt]=dropdims(weighted_ave(var, wght, ave_dims), dims=ave_dims)
        end
 end

outer gm (global mean) gets modified and has correct values? In general, I have a hard time to understand how can I read a dataset in a safe way using do block

NCDataset(fname) do ds
  var1=ds["var1"]
  var2=ds["var2"]
end

but make var1, var2 ... etc be visible outside do block within a function?

johnmyleswhite · September 11, 2023, 1:09am

In your first example, you assign to a variable called y in the body.
In your second example, you do not assign to a variable called gm in the body. Instead you call the mutating function setindex! on the global variable gm since, in general, x[i] = y is syntactic sugar for setindex!(x, y, i).

yvikhlya · September 11, 2023, 1:17am

Oh, I see. So, in first case it created a local variable y, but in second case there is no need to create a local variable, and a function is applied to an outer variable. I think I got it now. Thanks!

yvikhlya · September 11, 2023, 2:02am

Yep indeed

function f!(x,y)
       global y=x
 end

y=0

map(1:10) do x
       f!(x,y)
end

julia> y
10

yvikhlya · September 11, 2023, 2:05am

I think I got solution to second problem too

var1,var2=NCDataset(fname) do ds
  var1=ds["var1"]
  var2=ds["var2"]
  return var1,var2
end

Is this correct?

Sevi · September 11, 2023, 7:54am

Yes the do block creates and executes a function that takes ds as argument and returns var1 and var2. These are directly assigned to the “outside” variables var1 and var2. (*)

You can also skip the redundant variables if using the same variable name is confusing (at least in this short example it might be more readable):

var1, var2 = NCDataset(fname) do ds
  return ds["var1"], ds["var2"]
end

(*) EDIT: The second part is not always true, see next answers.

GunnarFarneback · September 11, 2023, 11:38am

To nitpick the do block returns var1 and var2 to somewhere in NCDataset where the anonymous function was called, whereas the outer variables are assigned the return value of NCDataset. Whether those are the same depends on how NCDataset is implemented.

Sevi · September 12, 2023, 5:51am

Thanks, you’re absolutely right! Whether it works like in the example depends on how NCDataset is implemented. I guess it looks something like this:

function NCDataset(f, fname)
    ds = ... # Read the file at fname
    return f(ds) # Execute the `do` function and return its value
end

f is whatever was supplied with the do syntax But whether NCDataset returns the value of f has to be checked. In principle it could return anything.

function NCDataset(f, fname)
    ds = ... # Read the file at fname
    results = f(ds) # Execute the `do` function and store value in local variable
    return nothing # In this case, the `results` of `f` are not returned
end

One way to solve the problem in this case would be to already define the variables before calling the function (although you should probably avoid using global and put everything into a function if performance is important here):

var1 = nothing
var2 = nothing

NCDataset(fname) do ds
  global var1, var2
  var1=ds["var1"]
  var2=ds["var2"]
  # return doesn't matter now
end

yvikhlya · September 12, 2023, 12:01pm

Using globals is what I initially did, then I started to doubt that this is right. My data sets are huge and performance is indeed important. NCDatasets itself is type unstable btw and documentation recommends to use function barriers, this is what I am going to implement.

Sevi · September 12, 2023, 12:07pm

Not a user of NCDatasets, but I would expect that the type instability of NCDataset is not a big problem, unless you are handling a lot of small datasets. If there a few big ones and the time it takes to process a dataset is much longer than to call NCDatasets, the impact should be small. (As long as you use a function barrier before you process the dataset, as you mentioned.)

yvikhlya · September 12, 2023, 12:16pm

NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.

Alexander-Barth · September 12, 2023, 1:40pm

NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.

Yes, I confirm that this is correct.

In the old (julia v0.6) scoping rules, the do-blocks where a bit more easier to use for this kind of use case.

What I use most of the time is (similar to the example of sevi):

var1, var2 = NCDataset(fname) do ds
  ds["var1"][:], ds["var2"][:] # assuming they are 1d arrays
end
# var1, var2: are julia arrays

The definition of NCDatasets(function,...) is what you expected. It returns the result of the do-block function.

or sometimes having a load function is more readable:

function loadstuff(fname)
  NCDataset(fname) do ds
    return (ds["var1"][:], ds["var2"][:]) # assuming they are 1d arrays
  end
end
var1, var2 = loadstuff(fname)

yvikhlya · September 12, 2023, 8:11pm

Actually, if do block is introduced inside a function, it creates a soft scope, not a hard scope, as any inner function:

function main()
    y=0
    map(1:10) do x
        y=x
    end
    return y
end

y=main()

y
10

I feel I have to re-read the entire “Scope of Variables” section of manual more carefully.

Sevi · September 13, 2023, 6:13am

Yes, there’s a lot to unpack in the scoping rules

I think the relevant part is this (but please, anyone, correct me if I’m wrong!):

If you assign to an existing local, it always updates that existing local: you can only shadow a local by explicitly declaring a new local in a nested scope with the local keyword.

from this section about local scope Scope of Variables · The Julia Language

Barget · September 18, 2023, 8:59pm

Just to disambiguate things for anyone (likely new to julia) who would read this and wonder: Sevi’s answer is totally correct, but I wanted to explicitly add that the do block is still (and always remain) a hard scope, as per the provided link by Sevi. It does not “become” soft scope because defined within a function.

Topic		Replies	Views
Will my variable be modified in a `do` block? Very subtle to anticipate General Usage question , threads	13	109	July 28, 2025
Scope of variable General Usage	7	520	July 20, 2020
Scope of variables inside functions, for loops, and if statments General Usage	6	1248	May 8, 2020
Pattern to use additional variables in do block? General Usage	2	260	June 14, 2021
Global variable issue General Usage question , scope , while-for-scope	28	1473	January 4, 2024

"do" block and hard scope

Related topics