I keep getting the following strange error from each of the workers when running my code on multiple workers in an HPC environment:
Warning: Module Distributed with build ID … is missing from the cache. This may mean Distributed does not support precompilation but is imported by a module that does.
The code runs otherwise without error and I get the correct output. Furthermore, locally I can precompile and run the same code without issue, but I want to diagnose the issue before it causes problems down the road.
My code is structured as following.
using Distributed
@everywhere using LQMRunner
@everywhere using LatticeQM.Operators
@everywhere using LatticeQM.Spectrum
@everywhere import LinearAlgebra: BLAS.set_num_threads, mul!, lmul!, Diagonal
@everywhere set_num_threads(1)
if length(ARGS) != 0
@time startprcs(ARGS[1])
println("FINISHED...SLEEPING.")
sleep(3)
else
error("No job was provided to the script.")
end
Where startprcs eventually calls a method contained in a submodule of LQMRunner which is written to utilize the Distributed workers. This submodule imports Distributed to utilize the workers, in particular it utilizes the myid() and remote_do() methods.
EDIT: I have tried
using LQMRunner
@everywhere using LQMRunner
to force precompilation before the code is passed to the workers but I get the same warnings from all of my workers.