Struggling with parallel code in MCMC simulation

Hi!

I’m really struggling with parallel code execution in Julia. I have implemented an MCMC sampler that depends on packages such as Distributions, Random, etc., as well as two other julia files which I have written. Things work well with serial execution. I have also followed the docs and gotten parallel examples to work with the DummyModule for code availability, but I cannot get my full code
to run in parallel.

I am very confused with the proper way to make code available on all cores. My code is structured like so:

SampleSources.jl --> RJMCMC.jl --> TransformImages.jl

SampleSources.jl is the main script that is run. Inside there is a function sample_sources_main() that looks like:

function sample_sources_main()
    #do some setup
    rngs = [MersenneTwister() for _ in 1:N_CHAINS]
    posterior, stats = collect(pmap(do_mcmc, rngs, on_error=identity))
    # write to disk
end

So I am running multiple independent simulations in parallel, and then combining them with the collect function.

Does just this module need to be included everywhere or its dependencies as well?

I have gotten things to start to work by loading a different file which I have made to include things everywhere. The entire contents of the file are here:

@everywhere begin
    using Pkg
    Pkg.activate(".")
    include("<absolutepath>/src/SampleSources.jl")
    using .SampleSources
end

I could only get this to work with the absolute path. When it was include("src/SampleSources.jl") I’d recieve the error: No file named <path>/src/src/SampleSources.jl. When it was include(“SampleSources.jl”) I’d receive the error: No file named <path>/SampleSources.jl. I’d run everything from the package directory above src. That’s another issue but not blocking at least…

Finally I’d execute the program by running julia -p 2 --project=Project.toml -L src/PLoad.jl and then executing SampleSources.sample_sources_main() in the shell. Things finally started to work! However, I’d receive a KeyError in the collect function. When running this same code in “parallel” with just 1 worker (starting julia without the “-p” flag) I do not receive this error. Upon inspecting the dictionaries, it is clear that the key exists. However, the dictionary is keyed on an enum defined in RJMCMC.jl, a dependency of the main SampleSources file. Could this be the issue? I’m really struggling to figure this out. As this function is executed after the pmap call, isn’t this run on the base process for which the code availability issue shouldn’t exist?

I’m really not sure what the right workflow is for setting up code to run in parallel like this is, and it seems I have almost gotten this to work without even knowing why.

Can you clarify your directory/package structure, and which of SampleSources.jl, RJMCMC.jl, and TransformImages.jl are just Julia source files vs. full modules? And what environment are you activating with Pkg.activate(".")?

If you can give a minimum working example with the same structure as your code, we can give more specific help, but in general a pattern like this should work:

Write a file "parallel_mcmc_source_code.jl" that includes everything you need to run the model:

using SomePackage
using AnotherPackage
include("my_helper_functions.jl")

function do_mcmc(rng, args...)
    # some code...
    return posterior, stats
end

And then a script "run_parallel_mcmc.jl" that actually executes it in parallel:

using Distributed
addprocs()
# executes parallel_mcmc_source_code.jl on each process, including any `using`
# statements or `include`-ed code
@everywhere include("parallel_mcmc_source_code.jl")

rngs = [MersenneTwister() for _ in 1:N_CHAINS]
posterior, stats = pmap(do_mcmc, rngs))
1 Like

Thank you so much for the reply.

The directory structure is as follows:

SourceInference.jl - Package? directory and cwd whenever I run files (ie. julia src/SampleSources.jl)
    src
        SampleSources.jl - module
        RJMCMC.jl - module
        TransformImages.jl - module
        PLoad.jl - file containing the @everywhere loaded with -L flag
    Project.toml
    Manifest.toml

All 3 of those julia files are wrapped in module because I thought that was required in order to use the functions within them from other files (doesn’t work when not inside module). When I run Pkg.activate(".") it is activating the project environment of SourceInference.jl.

I will work on refactoring the code in the way that you suggest, thank you. However, doesn’t the environment need be activated on each process as well? Where would that happen?

Modules are not required to pull code from one source file into another, you can just use include for that. Modules create separate namespaces, which can be helpful in organizing larger projects, since they let the module’s caller access export-ed functions without crowding the workspace with all the module’s internals. However, even large packages often only define one module at the top level. See, for instance Distributions.jl, which has a single top-level module and just pastes in 20 source files one after another inside it.

Having a dedicated project environment is useful if you’re writing a package, or you want to share your project with others in a reproducible way, or if your project needs to use particular versions of particular packages. Activating the environment doesn’t load packages itself, it just points Julia to where they should be loaded from when you run using SomePackage. If Julia can find all the required packages in its default environment (i.e. (v1.3) pkg> as opposed to (SourceInference) pkg>), you don’t need to activate the project on each worker.

The tradeoff between modularity, reproducibility, and simplicity is a design decision for you to make based on your needs–it’s hard to give more specific advice without seeing your code. I suspect you may have your project “over-organized,” which is leading to some of your issues. I’d suggest trying to simplify it a bit and see if you can make it run just by include-ing your source files and using the default Julia environment. You can always package it up more rigorously if you need to later.

1 Like

Thank you again for this.

The only reason that I am using a project environment is because I am testing my program both locally as well as running it in parallel on a remote machine. All of the packages required can be found from the default package, but wouldn’t this require doing Pkg.add(package) on both machines for each package? If not, please let me know cuz I’d love to reduce the need to activate on all. Thanks :slight_smile:

Yes, then in your case it might make sense to use a project environment to keep your packages synced between the machines…though you can of course just manually install the required 3rd-party packages on the remote machine. You’d only have to do this once, and you could just do pkg> up to keep them at the latest version on both machines.

A little googling turned up this open issue, so you’re not the only one to run into this!

Yes, I think it is problematic because your module is not proper Julia package (e.g., it’d be importable without include if so). Due to this, different Julia processes can’t agree that their .SampleSources.RJMCMC are the “same” module. I think you can fix it by stop using @enum or creating a proper Julia package.

1 Like

Amazing. Thank you for all the info. After refactoring my code I stopped receiving the keyerror as well. Thank you your help!