`@everywhere begin ... end` errors when loading packages (but only once)

Consider a directory that contains only the following two files.

# one.jl
using Distributed
addprocs(4 - nprocs() + 1)

@everywhere begin
    using Pkg
    Pkg.activate(".")
    using MAT
end

println("done")
# two.jl
using Distributed
addprocs(4 - nprocs() + 1)

@everywhere begin
    using Pkg
    Pkg.activate(".")
end

@everywhere using MAT

println("done")

(Note that the only difference between the two is where using MAT occurs.)

I add MAT.jl to a new environment in the current directory with $ julia -e 'using Pkg; Pkg.activate("."); Pkg.add("MAT")'.

Now, including one.jl from the REPL errors the first time (but then including it again works fine), but two.jl can be included from the REPL without any issues:

$ julia --project --banner=no
julia> include("one.jl")
ERROR: LoadError: On worker 2:
ArgumentError: Package MAT [23992714-dd62-5051-b70f-ba57cb901cac] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.

Stacktrace:
 [1] _require
   @ ./loading.jl:1012
 [2] require
   @ ./loading.jl:936
 [3] #1
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/Distributed.jl:79
 [4] #103
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:274
 [5] run_work_thunk
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:63
 [6] run_work_thunk
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:72
 [7] #96
   @ ./task.jl:411

...and 3 more exceptions.

Stacktrace:
  [1] sync_end(c::Channel{Any})
    @ Base ./task.jl:369
  [2] macro expansion
    @ ./task.jl:388 [inlined]
  [3] _require_callback(mod::Base.PkgId)
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/Distributed.jl:76
  [4] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
  [5] invokelatest
    @ ./essentials.jl:706 [inlined]
  [6] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:942
  [7] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:923
  [8] top-level scope
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:204
  [9] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [10] top-level scope
    @ REPL[1]:1
in expression starting at /home/steven/tmp/one.jl:4

julia> include("one.jl")
  Activating environment at `~/tmp/Project.toml`
      From worker 3:	  Activating environment at `~/tmp/Project.toml`
      From worker 4:	  Activating environment at `~/tmp/Project.toml`
      From worker 2:	  Activating environment at `~/tmp/Project.toml`
      From worker 5:	  Activating environment at `~/tmp/Project.toml`
done
$ julia --project --banner=no
julia> include("two.jl")
  Activating environment at `~/tmp/Project.toml`
      From worker 3:	  Activating environment at `~/tmp/Project.toml`
      From worker 5:	  Activating environment at `~/tmp/Project.toml`
      From worker 4:	  Activating environment at `~/tmp/Project.toml`
      From worker 2:	  Activating environment at `~/tmp/Project.toml`
done

Can someone please explain why one.jl fails but two.jl does not? And why one.jl fails only once in a Julia session?

Thanks in advance!

EDIT: I’m using Julia 1.6.2, but I see the same behavior on 1.6.1.

Check the ultimate guide: https://github.com/juliohm/julia-distributed-computing

Thanks for the link. I saw this part that is relevant:

  • We wrap the preamble into two @everywhere begin … end blocks. … Two separate blocks are needed so that the environment is properly instantiated in all processes before we start loading packages.

But I don’t think this quote really answers my question. I already saw that two @everywhere blocks were needed, but why? I’m running the processes on the same machine, so I thought that instantiating the environment for each process wouldn’t be necessary because all the packages are already installed. But even if I did need to run Pkg.instantiate(), why do I need two @everywhere blocks? Maybe I’m missing something from your link?


I guess I thought that

@everywhere begin
    line 1
    line 2
    ...
end

was equivalent to

@everywhere line 1
@everywhere line 2
@everywhere ...

which in turn was equivalent to running

line 1
line 2
...

on each process. So I would think that if the code runs correctly on a single process (without @everywhere), then it should work on multiple processes (at least on the same machine). But I can see that’s wrong, so I’m wondering what I’m missing.

I don’t think you are missing anything? It is just the way it is currently? You can open an issue in the Julia language repository if you think that the two blocks constraint should be fixed.

As explained in the link, you need to fully instantiate the environment everywhere before you start loading packages. I don’t see a major issue with two blocks. Maybe I am missing your point?

Previously I stated that the quote I pulled didn’t answer my question, but I guess it technically does. (I.e., one.jl fails because the environment needs to be fully instantiated everywhere before loading packages.)

But I don’t understand why the environment has to be fully instantiated everywhere before loading packages. If I have the following

@everywhere begin
    using Pkg
    Pkg.activate(".")
    using MAT
end

then if we look at one of the processes, after the call to activate the environment is ready for that process, so it should be able to load MAT. But that’s not how it works, and I’m wondering why.

Furthermore, it’s not just that the environment has to be instantiated everywhere before loading packages; otherwise the following should work:

@everywhere begin
    using Pkg
    Pkg.activate(".")
    sleep(5) # Give time for the other processes to be fully instantiated
    using MAT
end

So there’s something else that’s happening at the end of the @everywhere block that finalizes the environment setup.

Oh, me neither, I was mostly just curious why the two blocks are necessary. I ran into the issue because I had code like

using Pkg
Pkg.activate(".")
using Package
using AnotherPackage

and I was wanting to parallelize it by wrapping the using statements (including activating the environment) with just one @everywhere block.

If you want to use a single block, you can just @eval the using statements. That’s what I do, and it seems to work fine with any number of processes or packages. I’m not really convinced by the handwaving about environments needing to instantiate fully. It doesn’t make much sense to me, and I haven’t seen a very precise explanation.

It looks like the @everywhere macro does some pre-processing of the expression to handle using statements, which might explain some of the weirdness you encounter. At any rate, @eval (or any other macro, like @time) causes the statements to get passed over and treated normally.

EDIT: removed a paragraph where I speculated incorrectly.

1 Like