Confusing behaviour of include("source.jl") in parallel codes

Julia’s rules when writing parallelized codes have caused a lot of frustration since I started using Julia many years ago. I have encountered another issue today. When using @everywhere workers() with the include("source.jl") syntax, Julia throws an error as follows.

Let’s have a source.jl file containing the following code:

x_worker = $x

Now, when I run the following code,

using Distributed
addprocs(2)

x = 2
@everywhere workers() include("source.jl")

julia throws the following error

LoadError: syntax: "$" expression outside quote around source.jl:1

But when I change the code to

using Distributed
addprocs(2)

x = 2
@everywhere workers() x_worker = $x
@fetchfrom 2 x_worker

The code works as expected and returns 2. I am very confused with this behavior. And what is the correct procedure for accessing variables defined on the master processor from a worker when using the include syntax?

Thank you so much for your help.

The reason you get the error message is as follows.

@everywhere is a macro. One consequence of this is that its arguments are not evaluated/run before fed to the macro. Rather, the macro receives just the text of the arguments, in a more convenient form. The only thing a macro can do is to rewrite its arguments into some other julia code.

The $x signifies to the @everywhere macro that the value of x (which is 2 in your case) should be put into the expression x_worker = 2, rather than the name x. This expression is sent to each of the workers. Without the $, the expression x_worker = x would be sent (literally, in a more convenient form), and the workers will complain that they have no x.

Now, if you put the $x inside the file, the macro @everywhere will not see it. It only sees the literal expressions and do not attempt to evaluate them. So it merely sends the expression include("source.jl") to each of the workers. The workers include the file, encounter the $x and complains. Just like you would see if you just do an include("source.jl") at the julia-prompt.

I do not use Distributed regularly, and don’t want to comment on better ways than your working example.

5 Likes

This has nothing to do with include. Just define the variable on that worker using @everywhere or @spawnat e.g.

@everywhere x=42
@fetchfrom 2 x

Sorry, but that doesn’t solve the problem. If I have a large array that I want to access from each worker, defining it everywhere using @everywhere would increase memory requirements by a lot.

Yes but this is a different problem now :slight_smile:

See the issue with using different workers from Distributed.jl is that they are different process which have separate memory (and in fact do not have to live on the same physical machine even!).
The simple way would be to use threads, if you don’t need the multi-machine parallelism.
If you really need that, then you need to start thinking about data movement a lot.
There are some options:

  • Can you perhaps construct this large Array directly on the worker?
  • To just move it one other worker, you can use @spawnat to just do something on a specific worker, e.g.
@spawnat 2 mylargearray = $mylargearray
  • If you need to share a large Array to multiple workers on the same machine and still want to use Distributed.jl, you can have a look at SharedArrays.jl
1 Like

Alternative solution: Use Dagger.jl, which has proper data-management utilities built-in.

If you want to partition an array A automatically across your workers, just do DA = DArray(A). If you want to make one big array on only a single worker (which I don’t recommend), you can do DA = fetch(Dagger.@spawn scope=Dagger.scope(worker=w) DArray(A, Blocks(size(A)...))) (where w is the worker you want to copy the array to, and where it will live.

2 Likes

In general, I would move your thinking away from “running scripts”, where you want to substitute some values into global variables in the script in order to control its operation.

Instead, write functions (and ideally, for any nontrivial amount of code, put the functions in a module in a package), and then call the functions with different parameters to control their behavior.

5 Likes