Reusing variable in multiple parallel calls


I am going to be calling the same fft on several remote workers many times each. To get an additional speed improvement, I am aiming to use plan_fft. But plan_fft objects aren’t thread safe, so I can’t reuse the same plan_fft on every process (or I assume this is why I kept getting segmentation faults on code that was fine when run with only one child process). Ideally, I would like to define a plan_fft on each process, using for example @everywhere plan = plan_fft(...). However, I’m not sure if there’s a way to use @spawn or remotecall to use a combination of local and remote variables.



I have never used fft in Julia, however if you want to use some local variables from the caller process as arguments to plan_fft(...), with plan_fft(...) running on the workers, you might consider enumerating the workers explicitly with workers() function, and spawning your code using @spawnat. Not sure if the arguments to plan_fft are thread safe, though.

I don’t see any problems with your approach using @everywhere. Note that segfaults can occur due to multitude of reasons, so don’t jump to conclusions too quickly on that one.



Thread safety is not a concern for parallel calls. You do need to create the plan on each workers though.



I’m not sure how to call the resulting plan in the workers. They get defined using @everywhere, but once it’s defined on the worker I’m not sure how to use that variable on that worker. I can’t seem to find a way to feed “local” variables to @spawnat or remotecall.

I guess to be a bit more precise, here’s a toy example of what’s going on at the moment

module A
  function mainLoop()
    ThePlan = plan_fft(dummyMatrix)
      for i=1:ABillion
        @spawn actuallyDoStuff(data,ThePlan)
  function actuallyDoStuff(ThePlan)

I’m unsure how to define ThePlan in a way that is localized to a worker, but can be called via @spawn. I can’t define it in the module outside of a function, because I don’t have information on the user’s data at that point.



There may be other better ways but here are the two methods I’ve seen to compute and store a result on a worker process, then access the result on the worker later. I use the simple example of generating a vector on a worker then computing the sum of the elements, also on the worker process.

The first approach is to compute your result on the remote worker and return a reference to it on the master process, then write a helper function that takes that reference as input, fetches it, then calls the real function. If you do the fetch on the process where that reference’s data is stored, then fetching is a no-op. For example, the following works:

julia> xref = @spawnat 2 ones(6)
Future(2, 1, 6, nothing)

julia> @everywhere remote_sum(x) = sum(fetch(x))

julia> s = remotecall_fetch(remote_sum,2,xref)

The other approach I know of is the one used by the ParallelDataTransfer package. Check out its readme for examples. Using their macros, you can do the above example with the following code:

@everywhere using ParallelDataTransfer

julia> @defineat 3 x=ones(8)
Future(3, 1, 33, nothing)

julia> @defineat 3 s2 = sum(x)
Future(3, 1, 35, nothing)

julia> s2loc = @getfrom 3 s2

Note that in the first approach, if you want to store a value on a remote worker and later modify it, you need to put it in a remoteChannel, rather than just getting the reference to the result as a Future.