outer y does not get modified, as stated in documentation, but in this case
gm=zeros(Float64, Nlev, Nt)
var=zeros(Union{Missing, Float32}, Nx, Ny, Nlev)
ave_dims=(1,2)
for (tt, ff) in enumerate(files)
@show tt, ff
# Seems like here we need to proceed file by file instead of using
# a multi-file dataset
NCDataset(ff) do ds
var[:,:,:]=ds[varname][:,:,:,1]::Array{Union{Missing, Float32},3}
gm[:,tt]=dropdims(weighted_ave(var, wght, ave_dims), dims=ave_dims)
end
end
outer gm (global mean) gets modified and has correct values? In general, I have a hard time to understand how can I read a dataset in a safe way using do block
NCDataset(fname) do ds
var1=ds["var1"]
var2=ds["var2"]
end
but make var1, var2 ... etc be visible outside do block within a function?
In your first example, you assign to a variable called y in the body.
In your second example, you do not assign to a variable called gm in the body. Instead you call the mutating function setindex! on the global variable gm since, in general, x[i] = y is syntactic sugar for setindex!(x, y, i).
Oh, I see. So, in first case it created a local variable y, but in second case there is no need to create a local variable, and a function is applied to an outer variable. I think I got it now. Thanks!
Yes the do block creates and executes a function that takes ds as argument and returns var1 and var2. These are directly assigned to the “outside” variables var1 and var2. (*)
You can also skip the redundant variables if using the same variable name is confusing (at least in this short example it might be more readable):
var1, var2 = NCDataset(fname) do ds
return ds["var1"], ds["var2"]
end
(*) EDIT: The second part is not always true, see next answers.
To nitpick the do block returns var1 and var2 to somewhere in NCDataset where the anonymous function was called, whereas the outer variables are assigned the return value of NCDataset. Whether those are the same depends on how NCDataset is implemented.
Thanks, you’re absolutely right! Whether it works like in the example depends on how NCDataset is implemented. I guess it looks something like this:
function NCDataset(f, fname)
ds = ... # Read the file at fname
return f(ds) # Execute the `do` function and return its value
end
f is whatever was supplied with the do syntax But whether NCDataset returns the value of f has to be checked. In principle it could return anything.
function NCDataset(f, fname)
ds = ... # Read the file at fname
results = f(ds) # Execute the `do` function and store value in local variable
return nothing # In this case, the `results` of `f` are not returned
end
One way to solve the problem in this case would be to already define the variables before calling the function (although you should probably avoid using global and put everything into a function if performance is important here):
var1 = nothing
var2 = nothing
NCDataset(fname) do ds
global var1, var2
var1=ds["var1"]
var2=ds["var2"]
# return doesn't matter now
end
Using globals is what I initially did, then I started to doubt that this is right. My data sets are huge and performance is indeed important. NCDatasets itself is type unstable btw and documentation recommends to use function barriers, this is what I am going to implement.
Not a user of NCDatasets, but I would expect that the type instability of NCDataset is not a big problem, unless you are handling a lot of small datasets. If there a few big ones and the time it takes to process a dataset is much longer than to call NCDatasets, the impact should be small. (As long as you use a function barrier before you process the dataset, as you mentioned.)
NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.
NCDatasets is type unstable because there is no way to know variable types before you open a data set and read meta data. So, their documentation recommends to separate reading the meta data (which is type unstable) and analysis of data with function barriers, the analysis is type stable in this case. This makes sense to me.
Yes, I confirm that this is correct.
In the old (julia v0.6) scoping rules, the do-blocks where a bit more easier to use for this kind of use case.
What I use most of the time is (similar to the example of sevi):
var1, var2 = NCDataset(fname) do ds
ds["var1"][:], ds["var2"][:] # assuming they are 1d arrays
end
# var1, var2: are julia arrays
The definition of NCDatasets(function,...) is what you expected. It returns the result of the do-block function.
or sometimes having a load function is more readable:
function loadstuff(fname)
NCDataset(fname) do ds
return (ds["var1"][:], ds["var2"][:]) # assuming they are 1d arrays
end
end
var1, var2 = loadstuff(fname)
Yes, there’s a lot to unpack in the scoping rules
I think the relevant part is this (but please, anyone, correct me if I’m wrong!):
If you assign to an existing local, it always updates that existing local: you can only shadow a local by explicitly declaring a new local in a nested scope with the local keyword.
Just to disambiguate things for anyone (likely new to julia) who would read this and wonder: Sevi’s answer is totally correct, but I wanted to explicitly add that the do block is still (and always remain) a hard scope, as per the provided link by Sevi. It does not “become” soft scope because defined within a function.