Hi!
I’m really struggling with parallel code execution in Julia. I have implemented an MCMC sampler that depends on packages such as Distributions, Random, etc., as well as two other julia files which I have written. Things work well with serial execution. I have also followed the docs and gotten parallel examples to work with the DummyModule for code availability, but I cannot get my full code
to run in parallel.
I am very confused with the proper way to make code available on all cores. My code is structured like so:
SampleSources.jl --> RJMCMC.jl --> TransformImages.jl
SampleSources.jl
is the main script that is run. Inside there is a function sample_sources_main()
that looks like:
function sample_sources_main()
#do some setup
rngs = [MersenneTwister() for _ in 1:N_CHAINS]
posterior, stats = collect(pmap(do_mcmc, rngs, on_error=identity))
# write to disk
end
So I am running multiple independent simulations in parallel, and then combining them with the collect
function.
Does just this module need to be included everywhere or its dependencies as well?
I have gotten things to start to work by loading a different file which I have made to include things everywhere. The entire contents of the file are here:
@everywhere begin
using Pkg
Pkg.activate(".")
include("<absolutepath>/src/SampleSources.jl")
using .SampleSources
end
I could only get this to work with the absolute path. When it was include("src/SampleSources.jl")
I’d recieve the error: No file named <path>/src/src/SampleSources.jl
. When it was include(“SampleSources.jl”) I’d receive the error: No file named <path>/SampleSources.jl
. I’d run everything from the package directory above src
. That’s another issue but not blocking at least…
Finally I’d execute the program by running julia -p 2 --project=Project.toml -L src/PLoad.jl
and then executing SampleSources.sample_sources_main()
in the shell. Things finally started to work! However, I’d receive a KeyError in the collect function. When running this same code in “parallel” with just 1 worker (starting julia without the “-p” flag) I do not receive this error. Upon inspecting the dictionaries, it is clear that the key exists. However, the dictionary is keyed on an enum defined in RJMCMC.jl, a dependency of the main SampleSources file. Could this be the issue? I’m really struggling to figure this out. As this function is executed after the pmap
call, isn’t this run on the base process for which the code availability issue shouldn’t exist?
I’m really not sure what the right workflow is for setting up code to run in parallel like this is, and it seems I have almost gotten this to work without even knowing why.