I am preparing my code to run it on a cluster using SLURM and exploiting multiple cores using the Distributed package. It’s a bit involved and I haven’t been able to reproduce the error in an MWE.
The point is to run a number of simulations using the command simulate but from different initial states (randomly sampled in this part of the project). I then want to stack the DataFrames each simulation produces as one (therefore, I use (vcat) and “df =” )
This fails and produces an error that randomInitialState() is not defined for worker 2.
However, if I run the for loop inline (in Juno), the code compiles as expected.
Also, if I remove @sync, vcat, and saving the result into df, the code runs.
I find this behavior surprising, and I haven’t found anyone who breaks their code by adding @sync. I wonder if anyone have suggestions what might be going on.
A sketch of the program
pids = addprocs(2)
@everywhere begin
using DataFrames
include("loadsDataAndCreates_randomPath_below.jl")
parameters = 1.0
end
df =
@sync @distributed (vcat) for i in eachindex(1:num_simulations)
state0_i = randomInitialState()
df_i = simulate(state0_i, parameters) # This returns a DataFrame
end
rmprocs(pids)
The error is long, but starts with
LoadError: TaskFailedException
nested task error: On worker 2:
UndefVarError: #randomInitialState#229 not defined
Stacktrace:
[1] deserialize_datatype
@ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Serialization\src\Serialization.jl:1280
[2] handle_deserialize
I am at a loss to Google how serialization (a new topic for me) is interfering with my code.