"Failed to precompile" error on worker processes

I’m trying to run my code on multiple processes. It was working before but I updated Atom and it no longer works. I’m getting “failed to precompile” errors when I try to use @everywhere to load a package.

Any tips? The base errors seem to be IOError like “mkdir: file already exists”

2 Likes

Maybe each worker is trying to compile packages concurrently?

I don’t have a clean solution for this, but you could try using Package on the master process to precompile the package before attempting @everywhere using Package. If distributed, you’ll probably want to precompile on just one worker on each host machine before loading on all remote workers.

2 Likes

Thanks, it was Distributed. I ended up deleting all my local packages (.julia/packages) and their compiled (.julia/compiled) files, restarted Julia, precompiled in the main thread, then ran the workers.

Glad its fixed, although maybe its worth mentioning that it couldn’t have been as simple as just this. When you run @everywhere using Package, Julia is first loading/precompiling the package just on the main process, and only after that is done is it loading on the workers, exactly to avoid such a race condition (this is why it takes 2x as long to load a package when you have workers).

Yes, I haven’t had an issue with local workers on the same machine (presumably because all workers share the same compile location). But won’t a race still exist for remote workers on each remote host?

You’re right, thats a good point, if your remote machines didn’t have a shared filesystem and you had more than one worker per machine, those could end up precompiling simultaneously and cause problems.

FWIW, I was using Distributed on a local machine, and with @everywhere the workers failed with “Error: LoadError: IOError: unlink: resource busy or locked (EBUSY)” or “Cannot write cache file “C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji”” when it was trying to load the active project’s module on each worker. The second time I ran it, it worked.

      From worker 7:    ERROR: LoadError: IOError: unlink: resource busy or locked (EBUSY)
      From worker 7:    Stacktrace:
      From worker 7:     [1] uv_error at .\libuv.jl:97 [inlined]
      From worker 7:     [2] unlink(::String) at .\file.jl:888
      From worker 7:     [3] rm(::String; force::Bool, recursive::Bool) at .\file.jl:268
      From worker 7:     [4] create_expr_cache(::String, ::String, ::Array{Pair{Base.PkgId,UInt64},1}, ::Base.UUID) at .\loading.jl:1149
      From worker 7:     [5] compilecache(::Base.PkgId, ::String) at .\loading.jl:1261
      From worker 7:     [6] _require(::Base.PkgId) at .\loading.jl:1029
      From worker 7:     [7] require(::Base.PkgId) at .\loading.jl:927
      From worker 7:     [8] require(::Module, ::Symbol) at .\loading.jl:922
      From worker 7:     [9] include(::Module, ::String) at .\Base.jl:377
      From worker 7:     [10] top-level scope at none:2
      From worker 7:     [11] eval at .\boot.jl:331 [inlined]
      From worker 7:     [12] eval(::Expr) at .\client.jl:449
      From worker 7:     [13] top-level scope at .\none:3
      From worker 7:    in expression starting at C:\Users\nicho\.julia\packages\LocalizationMicroscopy\2eAcs\src\LocalizationMicroscopy.jl:3

      From worker 7:    ERROR: LoadError: Failed to precompile LocalizationMicroscopy [798e6d30-19ab-11e9-36f8-37cfc5a8426e] to C:\Users\nicho\.julia\compiled\v1.4\LocalizationMicroscopy\bX0nX_CqdxD.ji.
      From worker 7:    Stacktrace:
      From worker 7:     [1] error(::String) at .\error.jl:33
      From worker 7:     [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
      From worker 7:     [3] _require(::Base.PkgId) at .\loading.jl:1029
      From worker 7:     [4] require(::Base.PkgId) at .\loading.jl:927
      From worker 7:     [5] require(::Module, ::Symbol) at .\loading.jl:922
      From worker 7:     [6] include(::Module, ::String) at .\Base.jl:377
      From worker 7:     [7] top-level scope at none:2
      From worker 7:     [8] eval at .\boot.jl:331 [inlined]
      From worker 7:     [9] eval(::Expr) at .\client.jl:449
      From worker 7:     [10] top-level scope at .\none:3
      From worker 7:    in expression starting at C:\Users\nicho\source\repos\SMLMAssociationAnalysis_NCB.jl\src\SMLMAssociationAnalysis_NCB.jl:4


Other workers had different errors:

      From worker 6:    Cannot write cache file "C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji".
      From worker 6:    ERROR: LoadError: Failed to precompile CategoricalArrays [324d7699-5711-5eae-9e2f-1d82baa6b597] to C:\Users\nicho\.julia\compiled\v1.4\CategoricalArrays\RHXoP_CqdxD.ji.
      From worker 6:    Stacktrace:
      From worker 6:     [1] error(::String) at .\error.jl:33
      From worker 6:     [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
      From worker 6:     [3] _require(::Base.PkgId) at .\loading.jl:1029
      From worker 6:     [4] require(::Base.PkgId) at .\loading.jl:927
      From worker 6:     [5] require(::Module, ::Symbol) at .\loading.jl:922
      From worker 6:     [6] include(::Module, ::String) at .\Base.jl:377
      From worker 6:     [7] top-level scope at none:2
      From worker 6:     [8] eval at .\boot.jl:331 [inlined]
      From worker 6:     [9] eval(::Expr) at .\client.jl:449
      From worker 6:     [10] top-level scope at .\none:3
      From worker 6:    in expression starting at C:\Users\nicho\.julia\packages\CSV\vyG0T\src\CSV.jl:17

      From worker 6:    ERROR: LoadError: Failed to precompile CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b] to C:\Users\nicho\.julia\compiled\v1.4\CSV\HHBkp_CqdxD.ji.
      From worker 6:    Stacktrace:
      From worker 6:     [1] error(::String) at .\error.jl:33
      From worker 6:     [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
      From worker 6:     [3] _require(::Base.PkgId) at .\loading.jl:1029
      From worker 6:     [4] require(::Base.PkgId) at .\loading.jl:927
      From worker 6:     [5] require(::Module, ::Symbol) at .\loading.jl:922
      From worker 6:     [6] include(::Module, ::String) at .\Base.jl:377
      From worker 6:     [7] top-level scope at none:2
      From worker 6:     [8] eval at .\boot.jl:331 [inlined]
      From worker 6:     [9] eval(::Expr) at .\client.jl:449
      From worker 6:     [10] top-level scope at .\none:3
      From worker 6:    in expression starting at C:\Users\nicho\.julia\packages\LocalizationMicroscopy\2eAcs\src\LocalizationMicroscopy.jl:3

      From worker 6:    ERROR: LoadError: Failed to precompile LocalizationMicroscopy [798e6d30-19ab-11e9-36f8-37cfc5a8426e] to C:\Users\nicho\.julia\compiled\v1.4\LocalizationMicroscopy\bX0nX_CqdxD.ji.
      From worker 6:    Stacktrace:
      From worker 6:     [1] error(::String) at .\error.jl:33
      From worker 6:     [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1272
      From worker 6:     [3] _require(::Base.PkgId) at .\loading.jl:1029
      From worker 6:     [4] require(::Base.PkgId) at .\loading.jl:927
      From worker 6:     [5] require(::Module, ::Symbol) at .\loading.jl:922
      From worker 6:     [6] include(::Module, ::String) at .\Base.jl:377
      From worker 6:     [7] top-level scope at none:2
      From worker 6:     [8] eval at .\boot.jl:331 [inlined]
      From worker 6:     [9] eval(::Expr) at .\client.jl:449
      From worker 6:     [10] top-level scope at .\none:3
      From worker 6:    in expression starting at C:\Users\nicho\source\repos\SMLMAssociationAnalysis_NCB.jl\src\SMLMAssociationAnalysis_NCB.jl:4

Ugh, this just happened again after doing pkg>update.

From various workers:

ERROR: LoadError: IOError: unlink: resource busy or locked (EBUSY)
ERROR: LoadError: LoadError: Failed to precompile GR [28b8d3ca-fb5f-59d9-8090-bfdbd6d07a71] to C:\Users\nicho.julia\compiled\v1.4\GR\NDU5Y_CqdxD.ji.

I also see this both on travis CI and on my local machine when using julia 1.4.2

And again… If anyone tries to use my code, they’re probably going to encounter this problem. And having 16 workers all trying to compile simultaneously stalls the system.

Any ideas where it might be coming from?

Could it be that the workers are running in a different environment than the main process, so the precompilation which happens on the main process doesn’t actually create the needed precompile files, and then the workers are all trying precompile in unison and you’re getting this error?

Hmm… I’m not sure how to check that. I’m setting ] activate . to my REPL and initializing the workers with addprocs(exeflags="--project")

Sounds like it wouldn’t be the case based on that, but as a sanity check

@everywhere (using Pkg; println(Pkg.API.Context().env.project_file)

should tell you.

If this is reproducible I recommend filing an Issue in the Julia repo, @everywhere using Foo with all workers on a local machine using the same filesystem should not be causing any precompile errors.

Thanks. It seems this was cased by precompilation. Precompiling on master processs before loading to worker processes works.

Just to be clear, having to do this by hand is a bug, unless you’re in a somewhat unusual distributed setup (eg different environments or non-shared filesystem). using Foo or @everywhere using Foo should be doing this for you.

1 Like

I created an issue:

3 Likes

Maybe also compare DEPOT_PATH on master and workers:

using Distributed
addprocs()
DEPOT_PATH
remotecall_fetch(()->DEPOT_PATH, 2)

and compare load paths:

Base.load_path()
remotecall_fetch(Base.load_path, 2)

Both DEPOT_PATH and Base.load_path() are identical between the main process and workers.

1 Like