I’m trying to run my code on multiple processes. It was working before but I updated Atom and it no longer works. I’m getting “failed to precompile” errors when I try to use @everywhere to load a package.
Any tips? The base errors seem to be IOError like “mkdir: file already exists”
2 Likes
Maybe each worker is trying to compile packages concurrently?
I don’t have a clean solution for this, but you could try using Package
on the master process to precompile the package before attempting @everywhere using Package
. If distributed, you’ll probably want to precompile on just one worker on each host machine before loading on all remote workers.
2 Likes
Thanks, it was Distributed. I ended up deleting all my local packages (.julia/packages) and their compiled (.julia/compiled) files, restarted Julia, precompiled in the main thread, then ran the workers.
Glad its fixed, although maybe its worth mentioning that it couldn’t have been as simple as just this. When you run @everywhere using Package
, Julia is first loading/precompiling the package just on the main process, and only after that is done is it loading on the workers, exactly to avoid such a race condition (this is why it takes 2x as long to load a package when you have workers).
Yes, I haven’t had an issue with local workers on the same machine (presumably because all workers share the same compile location). But won’t a race still exist for remote workers on each remote host?
You’re right, thats a good point, if your remote machines didn’t have a shared filesystem and you had more than one worker per machine, those could end up precompiling simultaneously and cause problems.
FWIW, I was using Distributed on a local machine, and with @everywhere the workers failed with “Error: LoadError: IOError: unlink: resource busy or locked (EBUSY)” or “Cannot write cache file “C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji”” when it was trying to load the active project’s module on each worker. The second time I ran it, it worked.
From worker 7: ERROR: LoadError: IOError: unlink: resource busy or locked (EBUSY)
From worker 7: Stacktrace:
From worker 7: [1] uv_error at .\libuv.jl:97 [inlined]
From worker 7: [2] unlink(::String) at .\file.jl:888
From worker 7: [3] rm(::String; force::Bool, recursive::Bool) at .\file.jl:268
From worker 7: [4] create_expr_cache(::String, ::String, ::Array{Pair{Base.PkgId,UInt64},1}, ::Base.UUID) at .\loading.jl:1149
From worker 7: [5] compilecache(::Base.PkgId, ::String) at .\loading.jl:1261
From worker 7: [6] _require(::Base.PkgId) at .\loading.jl:1029
From worker 7: [7] require(::Base.PkgId) at .\loading.jl:927
From worker 7: [8] require(::Module, ::Symbol) at .\loading.jl:922
From worker 7: [9] include(::Module, ::String) at .\Base.jl:377
From worker 7: [10] top-level scope at none:2
From worker 7: [11] eval at .\boot.jl:331 [inlined]
From worker 7: [12] eval(::Expr) at .\client.jl:449
From worker 7: [13] top-level scope at .\none:3
From worker 7: in expression starting at C:\Users\nicho\.julia\packages\LocalizationMicroscopy\2eAcs\src\LocalizationMicroscopy.jl:3
From worker 7: ERROR: LoadError: Failed to precompile LocalizationMicroscopy [798e6d30-19ab-11e9-36f8-37cfc5a8426e] to C:\Users\nicho\.julia\compiled\v1.4\LocalizationMicroscopy\bX0nX_CqdxD.ji.
From worker 7: Stacktrace:
From worker 7: [1] error(::String) at .\error.jl:33
From worker 7: [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
From worker 7: [3] _require(::Base.PkgId) at .\loading.jl:1029
From worker 7: [4] require(::Base.PkgId) at .\loading.jl:927
From worker 7: [5] require(::Module, ::Symbol) at .\loading.jl:922
From worker 7: [6] include(::Module, ::String) at .\Base.jl:377
From worker 7: [7] top-level scope at none:2
From worker 7: [8] eval at .\boot.jl:331 [inlined]
From worker 7: [9] eval(::Expr) at .\client.jl:449
From worker 7: [10] top-level scope at .\none:3
From worker 7: in expression starting at C:\Users\nicho\source\repos\SMLMAssociationAnalysis_NCB.jl\src\SMLMAssociationAnalysis_NCB.jl:4
Other workers had different errors:
From worker 6: Cannot write cache file "C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji".
From worker 6: ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji.
From worker 6: Stacktrace:
From worker 6: [1] error(::String) at .\error.jl:33
From worker 6: [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
From worker 6: [3] _require(::Base.PkgId) at .\loading.jl:1029
From worker 6: [4] require(::Base.PkgId) at .\loading.jl:927
From worker 6: [5] require(::Module, ::Symbol) at .\loading.jl:922
From worker 6: [6] include(::Module, ::String) at .\Base.jl:377
From worker 6: [7] top-level scope at none:2
From worker 6: [8] eval at .\boot.jl:331 [inlined]
From worker 6: [9] eval(::Expr) at .\client.jl:449
From worker 6: [10] top-level scope at .\none:3
From worker 6: in expression starting at C:\Users\nicho\.julia\packages\CSV\vyG0T\src\CSV.jl:17
From worker 6: ERROR: LoadError: Failed to precompile CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b] to C:\Users\nicho\.julia\compiled\v1.4\CSV\HHBkp_CqdxD.ji.
From worker 6: Stacktrace:
From worker 6: [1] error(::String) at .\error.jl:33
From worker 6: [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
From worker 6: [3] _require(::Base.PkgId) at .\loading.jl:1029
From worker 6: [4] require(::Base.PkgId) at .\loading.jl:927
From worker 6: [5] require(::Module, ::Symbol) at .\loading.jl:922
From worker 6: [6] include(::Module, ::String) at .\Base.jl:377
From worker 6: [7] top-level scope at none:2
From worker 6: [8] eval at .\boot.jl:331 [inlined]
From worker 6: [9] eval(::Expr) at .\client.jl:449
From worker 6: [10] top-level scope at .\none:3
From worker 6: in expression starting at C:\Users\nicho\.julia\packages\LocalizationMicroscopy\2eAcs\src\LocalizationMicroscopy.jl:3
From worker 6: ERROR: LoadError: Failed to precompile LocalizationMicroscopy [798e6d30-19ab-11e9-36f8-37cfc5a8426e] to C:\Users\nicho\.julia\compiled\v1.4\LocalizationMicroscopy\bX0nX_CqdxD.ji.
From worker 6: Stacktrace:
From worker 6: [1] error(::String) at .\error.jl:33
From worker 6: [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
From worker 6: [3] _require(::Base.PkgId) at .\loading.jl:1029
From worker 6: [4] require(::Base.PkgId) at .\loading.jl:927
From worker 6: [5] require(::Module, ::Symbol) at .\loading.jl:922
From worker 6: [6] include(::Module, ::String) at .\Base.jl:377
From worker 6: [7] top-level scope at none:2
From worker 6: [8] eval at .\boot.jl:331 [inlined]
From worker 6: [9] eval(::Expr) at .\client.jl:449
From worker 6: [10] top-level scope at .\none:3
From worker 6: in expression starting at C:\Users\nicho\source\repos\SMLMAssociationAnalysis_NCB.jl\src\SMLMAssociationAnalysis_NCB.jl:4
Ugh, this just happened again after doing pkg>update.
From various workers:
ERROR: LoadError: IOError: unlink: resource busy or locked (EBUSY)
ERROR: LoadError: LoadError: Failed to precompile GR [28b8d3ca-fb5f-59d9-8090-bfdbd6d07a71] to C:\Users\nicho.julia\compiled\v1.4\GR\NDU5Y_CqdxD.ji.
I also see this both on travis CI and on my local machine when using julia 1.4.2
And again… If anyone tries to use my code, they’re probably going to encounter this problem. And having 16 workers all trying to compile simultaneously stalls the system.
Any ideas where it might be coming from?
Could it be that the workers are running in a different environment than the main process, so the precompilation which happens on the main process doesn’t actually create the needed precompile files, and then the workers are all trying precompile in unison and you’re getting this error?
Hmm… I’m not sure how to check that. I’m setting ] activate .
to my REPL and initializing the workers with addprocs(exeflags="--project")
Sounds like it wouldn’t be the case based on that, but as a sanity check
@everywhere (using Pkg; println(Pkg.API.Context().env.project_file)
should tell you.
If this is reproducible I recommend filing an Issue in the Julia repo, @everywhere using Foo
with all workers on a local machine using the same filesystem should not be causing any precompile errors.
Thanks. It seems this was cased by precompilation. Precompiling on master processs before loading to worker processes works.
Just to be clear, having to do this by hand is a bug, unless you’re in a somewhat unusual distributed setup (eg different environments or non-shared filesystem). using Foo
or @everywhere using Foo
should be doing this for you.
1 Like
Maybe also compare DEPOT_PATH
on master and workers:
using Distributed
addprocs()
DEPOT_PATH
remotecall_fetch(()->DEPOT_PATH, 2)
and compare load paths:
Base.load_path()
remotecall_fetch(Base.load_path, 2)
Both DEPOT_PATH and Base.load_path() are identical between the main process and workers.
1 Like