Precompilation error using Distributed on HPC

I keep getting the following strange error from each of the workers when running my code on multiple workers in an HPC environment:

Warning: Module Distributed with build ID … is missing from the cache. This may mean Distributed does not support precompilation but is imported by a module that does.

The code runs otherwise without error and I get the correct output. Furthermore, locally I can precompile and run the same code without issue, but I want to diagnose the issue before it causes problems down the road.

My code is structured as following.

using Distributed
@everywhere using LQMRunner
@everywhere using LatticeQM.Operators
@everywhere using LatticeQM.Spectrum
@everywhere import LinearAlgebra: BLAS.set_num_threads, mul!, lmul!, Diagonal
@everywhere set_num_threads(1)

if length(ARGS) != 0
	@time startprcs(ARGS[1])
	println("FINISHED...SLEEPING.")
	sleep(3)
else
	error("No job was provided to the script.")
end

Where startprcs eventually calls a method contained in a submodule of LQMRunner which is written to utilize the Distributed workers. This submodule imports Distributed to utilize the workers, in particular it utilizes the myid() and remote_do() methods.

EDIT: I have tried

using LQMRunner
@everywhere using LQMRunner

to force precompilation before the code is passed to the workers but I get the same warnings from all of my workers.

Did you find a solution? I’m facing a similar problem.

which julia version are you running?

1.10.2, packages are updated.

EDIT: My problem is solved. I changed the DEPOT_PATH at the start of the running julia session. This worked as expected when I was not using Distributed. However, when using Distributed I got warnings as in the OP. The only thing that worked was exporting JULIA_DEPOT_PATH before starting the julia session.

It is unclear to me, why changing the DEPOT_PATH doesn’t play nicely with Distributed…

Maybe it is because changes to DEPOT_PATH in an active session on the main process are not automatically reflected on remote workers, whereas with running JULIA_DEPOT_PATH=... julia you set up the environment variable so that it is inherited when the main process spawns remote workers?

1 Like