Error loading module on remote workers


#1

I’m trying to upgrade from Julia v0.4 to v0.5.

On v0.4 I just added remote workers and then typed using Module to make Module available on all workers

I have a clean install of Julia 0.5 on remote machine, no packages have been installed.
On v0.5 I now get the following error when using Formatting

Do I need to install packages on remote machines? I thought they were serialized from master?
Is there some other issue?

ERROR: LoadError: On worker 2:
LoadError: LoadError: SystemError: opening file C:\Users\plowman\.julia\lib\v0.5\Compat.ji: No such file or directory
 in #systemerror#51 at .\error.jl:34 [inlined]
 in systemerror at .\error.jl:34
 in open at .\iostream.jl:89
 in open at .\iostream.jl:101
 in stale_cachefile at .\loading.jl:659
 in _require_search_from_serialized at .\loading.jl:214
 in require at .\loading.jl:371
 in include_string at .\loading.jl:441
 in include_from_node1 at .\loading.jl:491
 in eval at .\boot.jl:234
 in require at .\loading.jl:415
 in include_string at .\loading.jl:441
 in include_from_node1 at .\loading.jl:491
 in eval at .\boot.jl:234
 in #775 at .\multi.jl:1909
 in #624 at .\multi.jl:1417
 in run_work_thunk at .\multi.jl:1001
 in run_work_thunk at .\multi.jl:1010 [inlined]
 in #599 at .\event.jl:68
while loading C:\Users\plowman\.julia\v0.5\Formatting\src\Formatting.jl, in expression starting on line 10
while loading F:\My Documents\S6\Scorpion\Julia v0.5\src\Main.jl, in expression starting on line 6
 in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1070
 in remotecall_fetch(::Function, ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1062
 in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1080
 in call_on_owner(::Function, ::Future, ::Int64, ::Vararg{Int64,N}) at .\multi.jl:1130
 in wait(::Future) at .\multi.jl:1145
 in require(::Symbol) at .\loading.jl:413
 in include_from_node1(::String) at .\loading.jl:488
while loading F:\My Documents\S6\Scorpion\Julia v0.5\src\Launcher.jl, in expression starting on line 43

#2

This is perhaps a simpler example:

addprocs(...)
@everywhere using Compat

This works on v0.4 but on v0.5 produces the following error. Seems to be looking for a compiled .ji file for Compat
In both versions, no packages are installed on the remote workers.

ERROR: On worker 2:
SystemError: opening file C:\Users\plowman\.julia\lib\v0.5\Compat.ji: No such file or directory
 in #systemerror#51 at .\error.jl:34 [inlined]
 in systemerror at .\error.jl:34
 in open at .\iostream.jl:89
 in open at .\iostream.jl:101
 in stale_cachefile at .\loading.jl:659
 in _require_search_from_serialized at .\loading.jl:214
 in require at .\loading.jl:371
 in eval at .\boot.jl:234
 in #1 at .\multi.jl:1957
 in #627 at .\multi.jl:1421
 in run_work_thunk at .\multi.jl:1001
 in macro expansion at .\multi.jl:1421 [inlined]
 in #626 at .\event.jl:68
 in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at .\multi.jl:1070
 in remotecall_fetch(::Function, ::Base.Worker) at .\multi.jl:1062
 in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64) at .\multi.jl:1080
 in remotecall_fetch(::Function, ::Int64) at .\multi.jl:1080
 in (::##2#4)() at .\multi.jl:1959
 in sync_end() at .\task.jl:311
 in macro expansion; at .\multi.jl:1968 [inlined]
 in anonymous at .\<missing>:?

Is this supposed to work on v0.5? Has something changed since v0.4?


#3
  • Can you load Compat locally on the master machine? Locally on the remote?
  • I believe that using Module loads Module on master and all workers, but doesn’t import exported names on workers. So when I do @everywhere using there’s a warning about replacing a module: it’s already been loaded:
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> @everywhere using Compat
WARNING: replacing module Compat.
WARNING: Method definition redirect_stdin(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition take!(Base.AbstractIOBuffer) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1713 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1713.
WARNING: Method definition redirect_stderr(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition redirect_stdout(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition isnull(Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1693 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1693.

julia>

#4

I’m not sure what changed from 0.4 to 0.5. The following works for me on 0.5:

import myModule #Runs only on master process
@everywhere using myModule

where myModule does not need to be installed on worker machines.

For more info check out the following two threads on the old Google-groups julia-users list:
https://groups.google.com/forum/#!topic/julia-users/k00-tUn_An8
https://groups.google.com/forum/#!searchin/julia-users/@everywhere$20using%7Csort:relevance/julia-users/UXrv1YNbYqY/9fEyfa_ECQAJ

The second discussion is referenced from the first.


#5

Thanks for your replies.

I don’t think my problem is related to the known “@everywhere using” issue.

I have done some more experimenting and to the clarify problem:

  • Error only occurs when loading module on workers on remote machines (everything is OK when workers are local on same machine)

  • It seems error occurs between v0.4.5 and v0.4.7 and persists into v0.5.0. That is, module loads correctly on v0.4.5, but errors on v0.4.7 and v0.5.0.

Here’s an example that hopefully is more revealing:

module TestModule
    export test
    #test(x) = println(x)
    using Formatting
    test(x) = println(format(x, commas=true))
end
addprocs(...) # add worker on *remote* machine 
using TestModule
test(1000)                          # this runs locally as expected for all versions
remotecall_fetch(pid, test, 1000)   # this works for v0.4.5 but errors on v0.4.7, v0.5.0

When TestModule does not have using Formatting, everything works on all versions.

When TestModule does have using Formatting, code works on v0.4.5 only:

1,000
        From worker 2:  1,000

On versions 0.4.7 and v0.5.0 I get the error as previously posted

ERROR: LoadError: On worker 2:
LoadError: LoadError: SystemError: opening file C:\Users\Greg\.julia\lib\v0.4\Compat.ji: No such file or directory
...

Maybe there is a problem with serialising dependant modules to remote workers?

Can anyone else reproduce this issue?


#6

I’ve tried to look at which files have changed in error backtrace, between v0.4.5 and v0.4.7.
multi.jl and iostream.jl did not change, but loading.jl has some changes.
PR #18230 changes loading.jl in v0.4.7 (apparently a backport of PR #18150)
Do you think this might be cause of my issue?


#7

I could not reproduce the issue. Tried on a linux cluster running julia 0.5.0 using workers on a different machine but with identical OS installed. Exact sequence of steps followed:

  1. Generate Julia package called TestModule. Put your code there:
module TestModule
    export test
    #test(x) = println(x)
    using Formatting
    test(x) = println(format(x, commas=true))
end
  1. Close and restart Julia.
  2. Add workers on a different machine by running addprocs(["machineID1";"machineID2"])
  3. run:
using TestModule
test(1000)  #Runs locally, as expected
remotecall_fetch(test, 3,1000) #where 3 is pid of worker on another machine

Everything ran without error. A bunch of my own code/code from other projects that I use in my work consists of modules that use other modules. That hasn’t caused any errors running on my research group’s linux cluster with julia 0.5.0. Based on the error you posted it looks like you’re running windows. Perhaps it’s a windows specific bug or an issue with Windows not playing well with whatever OS is running on your remote worker? Might be worth filing an issue on GitHub.


#8

Thanks Patrick for taking the time to test this.
Yes I’m using Windows 7, so maybe its Windows specific.
Could I ask whether you have Compat installed on the remote worker?
I’ll open an issue on GitHub.

Some more information (before I forget, and the reason for asking about whether you have Compat installed).

If I have Compat installed and precompiled on the remote worker then everything works. Interestingly:

  • Compat needs to be precompiled (looks for Compat.ji)
  • Seems to ignore location of package directory (reported by Pkg.dir()) on remote worker. I’m guessing it might be using same path as Pkg.dir() on master process.
  • Formatting package does not need to be installed on remote machine (only its dependency Compat)

#9

Opened an issue on GitHub: https://github.com/JuliaLang/julia/issues/19960


#10

The julia install I tested on didn’t have Compat installed. I overlooked that.