I’m trying to upgrade from Julia v0.4 to v0.5.
On v0.4 I just added remote workers and then typed using Module
to make Module available on all workers
I have a clean install of Julia 0.5 on remote machine, no packages have been installed.
On v0.5 I now get the following error when using Formatting
Do I need to install packages on remote machines? I thought they were serialized from master?
Is there some other issue?
ERROR: LoadError: On worker 2:
LoadError: LoadError: SystemError: opening file C:\Users\plowman\.julia\lib\v0.5\Compat.ji: No such file or directory
in #systemerror#51 at .\error.jl:34 [inlined]
in systemerror at .\error.jl:34
in open at .\iostream.jl:89
in open at .\iostream.jl:101
in stale_cachefile at .\loading.jl:659
in _require_search_from_serialized at .\loading.jl:214
in require at .\loading.jl:371
in include_string at .\loading.jl:441
in include_from_node1 at .\loading.jl:491
in eval at .\boot.jl:234
in require at .\loading.jl:415
in include_string at .\loading.jl:441
in include_from_node1 at .\loading.jl:491
in eval at .\boot.jl:234
in #775 at .\multi.jl:1909
in #624 at .\multi.jl:1417
in run_work_thunk at .\multi.jl:1001
in run_work_thunk at .\multi.jl:1010 [inlined]
in #599 at .\event.jl:68
while loading C:\Users\plowman\.julia\v0.5\Formatting\src\Formatting.jl, in expression starting on line 10
while loading F:\My Documents\S6\Scorpion\Julia v0.5\src\Main.jl, in expression starting on line 6
in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1070
in remotecall_fetch(::Function, ::Base.Worker, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1062
in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1080
in remotecall_fetch(::Function, ::Int64, ::Base.RRID, ::Vararg{Any,N}) at .\multi.jl:1080
in call_on_owner(::Function, ::Future, ::Int64, ::Vararg{Int64,N}) at .\multi.jl:1130
in wait(::Future) at .\multi.jl:1145
in require(::Symbol) at .\loading.jl:413
in include_from_node1(::String) at .\loading.jl:488
while loading F:\My Documents\S6\Scorpion\Julia v0.5\src\Launcher.jl, in expression starting on line 43
This is perhaps a simpler example:
addprocs(...)
@everywhere using Compat
This works on v0.4 but on v0.5 produces the following error. Seems to be looking for a compiled .ji
file for Compat
In both versions, no packages are installed on the remote workers.
ERROR: On worker 2:
SystemError: opening file C:\Users\plowman\.julia\lib\v0.5\Compat.ji: No such file or directory
in #systemerror#51 at .\error.jl:34 [inlined]
in systemerror at .\error.jl:34
in open at .\iostream.jl:89
in open at .\iostream.jl:101
in stale_cachefile at .\loading.jl:659
in _require_search_from_serialized at .\loading.jl:214
in require at .\loading.jl:371
in eval at .\boot.jl:234
in #1 at .\multi.jl:1957
in #627 at .\multi.jl:1421
in run_work_thunk at .\multi.jl:1001
in macro expansion at .\multi.jl:1421 [inlined]
in #626 at .\event.jl:68
in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at .\multi.jl:1070
in remotecall_fetch(::Function, ::Base.Worker) at .\multi.jl:1062
in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64) at .\multi.jl:1080
in remotecall_fetch(::Function, ::Int64) at .\multi.jl:1080
in (::##2#4)() at .\multi.jl:1959
in sync_end() at .\task.jl:311
in macro expansion; at .\multi.jl:1968 [inlined]
in anonymous at .\<missing>:?
Is this supposed to work on v0.5? Has something changed since v0.4?
- Can you load
Compat
locally on the master machine? Locally on the remote?
- I believe that
using Module
loads Module
on master and all workers, but doesn’t import exported names on workers. So when I do @everywhere using
there’s a warning about replacing a module: it’s already been loaded:
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.5.0 (2016-09-19 18:14 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-w64-mingw32
julia> addprocs(1)
1-element Array{Int64,1}:
2
julia> @everywhere using Compat
WARNING: replacing module Compat.
WARNING: Method definition redirect_stdin(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition take!(Base.AbstractIOBuffer) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1713 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1713.
WARNING: Method definition redirect_stderr(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition redirect_stdout(Function, Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1615.
WARNING: Method definition isnull(Any) in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1693 overwritten in module Compat at C:\Users\Evan\.julia\v0.5\Compat\src\Compat.jl:1693.
julia>
I’m not sure what changed from 0.4 to 0.5. The following works for me on 0.5:
import myModule #Runs only on master process
@everywhere using myModule
where myModule
does not need to be installed on worker machines.
For more info check out the following two threads on the old Google-groups julia-users list:
https://groups.google.com/forum/#!topic/julia-users/k00-tUn_An8
https://groups.google.com/forum/#!searchin/julia-users/@everywhere$20using%7Csort:relevance/julia-users/UXrv1YNbYqY/9fEyfa_ECQAJ
The second discussion is referenced from the first.
Thanks for your replies.
I don’t think my problem is related to the known “@everywhere using
” issue.
I have done some more experimenting and to the clarify problem:
-
Error only occurs when loading module on workers on remote machines (everything is OK when workers are local on same machine)
-
It seems error occurs between v0.4.5 and v0.4.7 and persists into v0.5.0. That is, module loads correctly on v0.4.5, but errors on v0.4.7 and v0.5.0.
Here’s an example that hopefully is more revealing:
module TestModule
export test
#test(x) = println(x)
using Formatting
test(x) = println(format(x, commas=true))
end
addprocs(...) # add worker on *remote* machine
using TestModule
test(1000) # this runs locally as expected for all versions
remotecall_fetch(pid, test, 1000) # this works for v0.4.5 but errors on v0.4.7, v0.5.0
When TestModule
does not have using Formatting
, everything works on all versions.
When TestModule
does have using Formatting
, code works on v0.4.5 only:
1,000
From worker 2: 1,000
On versions 0.4.7 and v0.5.0 I get the error as previously posted
ERROR: LoadError: On worker 2:
LoadError: LoadError: SystemError: opening file C:\Users\Greg\.julia\lib\v0.4\Compat.ji: No such file or directory
...
Maybe there is a problem with serialising dependant modules to remote workers?
Can anyone else reproduce this issue?
I’ve tried to look at which files have changed in error backtrace, between v0.4.5 and v0.4.7.
multi.jl
and iostream.jl
did not change, but loading.jl
has some changes.
PR #18230 changes loading.jl
in v0.4.7 (apparently a backport of PR #18150)
Do you think this might be cause of my issue?
https://github.com/JuliaLang/julia/pull/18230
I could not reproduce the issue. Tried on a linux cluster running julia 0.5.0 using workers on a different machine but with identical OS installed. Exact sequence of steps followed:
- Generate Julia package called TestModule. Put your code there:
module TestModule
export test
#test(x) = println(x)
using Formatting
test(x) = println(format(x, commas=true))
end
- Close and restart Julia.
- Add workers on a different machine by running
addprocs(["machineID1";"machineID2"])
- run:
using TestModule
test(1000) #Runs locally, as expected
remotecall_fetch(test, 3,1000) #where 3 is pid of worker on another machine
Everything ran without error. A bunch of my own code/code from other projects that I use in my work consists of modules that use other modules. That hasn’t caused any errors running on my research group’s linux cluster with julia 0.5.0. Based on the error you posted it looks like you’re running windows. Perhaps it’s a windows specific bug or an issue with Windows not playing well with whatever OS is running on your remote worker? Might be worth filing an issue on GitHub.
Thanks Patrick for taking the time to test this.
Yes I’m using Windows 7, so maybe its Windows specific.
Could I ask whether you have Compat
installed on the remote worker?
I’ll open an issue on GitHub.
Some more information (before I forget, and the reason for asking about whether you have Compat
installed).
If I have Compat
installed and precompiled on the remote worker then everything works. Interestingly:
- Compat needs to be precompiled (looks for
Compat.ji
)
- Seems to ignore location of package directory (reported by
Pkg.dir()
) on remote worker. I’m guessing it might be using same path as Pkg.dir()
on master process.
-
Formatting
package does not need to be installed on remote machine (only its dependency Compat
)
The julia install I tested on didn’t have Compat installed. I overlooked that.