Calling a package function with a parallelized for loop

I want to run some code in parallel, however I found a strange behavior that I would like to understand.
Here is a small working example:

using Random;
using Statistics;
using Distributed;
using SharedArrays;

nprocs()
addprocs(3);

nreps = 5;
result = SharedArray{Float64, 1}(nreps);
@distributed for i = 1:nreps
    data = randn(20);
    result[i] = Statistics.mean(data);
end;

This silly code simply computes the mean of a random sample from N(0, 1) of size 20, nreps times.
What bothers me is that the code breaks in case the second last line is simply the following:

    result[i] = mean(data);

Now, array ‘result’ was never changed in the for loop (it’s still full of 0s). Why is this? How can I simply call a function by its name, without explicitly referring to its package? And why isn’t function ‘randn’ suffering from the same problem?

This is a problem to me because the for loop that I really want to implement has many calls to a variety of functions (e.g., var, sd, tdistcdf), and keeping track of the package of each function seems a waste of typing effort and space. I do understand this behavior when there are multiple packages offering a function with the same name (this also happens to me a lot in R, which is my main programming language), but that is definitely not the explanation here.

Many thanks in advance,
bigoten

The first problem is the load order puts using Statistics before addprocs(3). Those using statements at the top of your script will not be evaluated on new workers, so you should do using Statistics after addprocs(3).

The second problem is that even after a using on the master when workers are present, the package is not loaded into Main on the workers like it is on the master; so you need to specify the module explicitly. You can fix this by executing @everywhere using Statistics to export the contents of Statistics into Main.

The third problem, which is not really your fault, is that apparently @distributed doesn’t print worker errors (such as mean not being defined in Main) to the master; there very well could be an issue for this on the Julia Github tracker, but I haven’t managed to locate one that matches yet.

2 Likes

Thank you @jpsamaroo! My code is finally running due to your explanation, plus I understood better what is happening. Thanks a lot :+1:

2 Likes