"do" block and hard scope

Could someone explain me why in this case:

y=0
map(1:10) do x
  y=x^2
end

outer y does not get modified, as stated in documentation, but in this case

gm=zeros(Float64, Nlev, Nt)
var=zeros(Union{Missing, Float32}, Nx, Ny, Nlev)
ave_dims=(1,2)
for (tt, ff) in enumerate(files)
        @show tt, ff
        # Seems like here we need to proceed file by file instead of using
        # a multi-file dataset
        NCDataset(ff) do ds
            var[:,:,:]=ds[varname][:,:,:,1]::Array{Union{Missing, Float32},3}
            gm[:,tt]=dropdims(weighted_ave(var, wght, ave_dims), dims=ave_dims)
        end
 end

outer gm (global mean) gets modified and has correct values? In general, I have a hard time to understand how can I read a dataset in a safe way using do block

NCDataset(fname) do ds
  var1=ds["var1"]
  var2=ds["var2"]
end

but make var1, var2 ... etc be visible outside do block within a function?

  • In your first example, you assign to a variable called y in the body.
  • In your second example, you do not assign to a variable called gm in the body. Instead you call the mutating function setindex! on the global variable gm since, in general, x[i] = y is syntactic sugar for setindex!(x, y, i).
5 Likes

Oh, I see. So, in first case it created a local variable y, but in second case there is no need to create a local variable, and a function is applied to an outer variable. I think I got it now. Thanks!

2 Likes

Yep indeed

function f!(x,y)
       global y=x
 end

y=0

map(1:10) do x
       f!(x,y)
end

julia> y
10

I think I got solution to second problem too

var1,var2=NCDataset(fname) do ds
  var1=ds["var1"]
  var2=ds["var2"]
  return var1,var2
end

Is this correct?

1 Like

Yes the do block creates and executes a function that takes ds as argument and returns var1 and var2. These are directly assigned to the “outside” variables var1 and var2. (*)

You can also skip the redundant variables if using the same variable name is confusing (at least in this short example it might be more readable):

var1, var2 = NCDataset(fname) do ds
  return ds["var1"], ds["var2"]
end

(*) EDIT: The second part is not always true, see next answers.

2 Likes

To nitpick the do block returns var1 and var2 to somewhere in NCDataset where the anonymous function was called, whereas the outer variables are assigned the return value of NCDataset. Whether those are the same depends on how NCDataset is implemented.

Thanks, you’re absolutely right! Whether it works like in the example depends on how NCDataset is implemented. I guess it looks something like this:

function NCDataset(f, fname)
    ds = ... # Read the file at fname
    return f(ds) # Execute the `do` function and return its value
end

f is whatever was supplied with the do syntax But whether NCDataset returns the value of f has to be checked. In principle it could return anything.

function NCDataset(f, fname)
    ds = ... # Read the file at fname
    results = f(ds) # Execute the `do` function and store value in local variable
    return nothing # In this case, the `results` of `f` are not returned
end

One way to solve the problem in this case would be to already define the variables before calling the function (although you should probably avoid using global and put everything into a function if performance is important here):

var1 = nothing
var2 = nothing

NCDataset(fname) do ds
  global var1, var2
  var1=ds["var1"]
  var2=ds["var2"]
  # return doesn't matter now
end
1 Like

Using globals is what I initially did, then I started to doubt that this is right. My data sets are huge and performance is indeed important. NCDatasets itself is type unstable btw and documentation recommends to use function barriers, this is what I am going to implement.

1 Like

Not a user of NCDatasets, but I would expect that the type instability of NCDataset is not a big problem, unless you are handling a lot of small datasets. If there a few big ones and the time it takes to process a dataset is much longer than to call NCDatasets, the impact should be small. (As long as you use a function barrier before you process the dataset, as you mentioned.)

NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.

2 Likes

NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.

Yes, I confirm that this is correct.

In the old (julia v0.6) scoping rules, the do-blocks where a bit more easier to use for this kind of use case.

What I use most of the time is (similar to the example of sevi):

var1, var2 = NCDataset(fname) do ds
  ds["var1"][:], ds["var2"][:] # assuming they are 1d arrays
end
# var1, var2: are julia arrays

The definition of NCDatasets(function,...) is what you expected. It returns the result of the do-block function.

or sometimes having a load function is more readable:

function loadstuff(fname)
  NCDataset(fname) do ds
    return (ds["var1"][:], ds["var2"][:]) # assuming they are 1d arrays
  end
end
var1, var2 = loadstuff(fname)
2 Likes

Actually, if do block is introduced inside a function, it creates a soft scope, not a hard scope, as any inner function:

function main()
    y=0
    map(1:10) do x
        y=x
    end
    return y
end

y=main()

y
10

I feel I have to re-read the entire “Scope of Variables” section of manual more carefully.

1 Like

Yes, there’s a lot to unpack in the scoping rules :sweat_smile:

I think the relevant part is this (but please, anyone, correct me if I’m wrong!):

If you assign to an existing local, it always updates that existing local: you can only shadow a local by explicitly declaring a new local in a nested scope with the local keyword.

from this section about local scope Scope of Variables · The Julia Language

2 Likes

Just to disambiguate things for anyone (likely new to julia) who would read this and wonder: Sevi’s answer is totally correct, but I wanted to explicitly add that the do block is still (and always remain) a hard scope, as per the provided link by Sevi. It does not “become” soft scope because defined within a function.

1 Like