Memory problem when using SlurmClusterManager.jl to add workers

I am new to using Distributed.jl on a cluster. I am trying to run jobs on a cluster with multiple nodes. Even though I specify a single node my Julia program should run on in the batch file, there seems to be a problem that memory is not shared.
The problem only seems to arise when I use SlurmClusterManager to add workers.

Does anyone have an idea what the problem is?

Here is my batch file:

#!/bin/bash
#SBATCH --ntasks=5
#SBATCH --nodes=1
#SBATCH --nodelist=node_01
#SBATCH --cpus-per-task=1
#SBATCH --time=00:04:00
#SBATCH --output=output/example-par-job_%j.out

# Load the Julia module
module purge
module load Julia/1.10.2

# Run the Julia script
julia --threads 1 par_test_script.jl

This is the content of par_test_script.jl that causes an error because the worker cannot find m and output in memory:

using Distributed, SharedArrays, SlurmClusterManager

# Add local workers
addprocs(SlurmManager())
println("Number of workers: ", nworkers())

@everywhere begin
    using SharedArrays
    
    m = SharedArray{Int}(2)
    m[1] = 1
    m[2] = 2

    function foo(a, m)
        println("Worker ID: $(myid())")
        println(gethostname())
        return sum(a .+ m)
    end
end

N = 10

# Shared array for result collection
output = SharedArray{Int}(N)

@sync @distributed for i in 1:N
    output[i] = foo(i, m)
end

display(output)
println("Finished Julia script")

Here is the result:

UNHANDLED TASK ERROR: On worker 2:
BoundsError: attempt to access 0-element Vector{Int64} at index [1]
Stacktrace:
  [1] setindex!
    @ ./array.jl:1021 [inlined]
  [2] setindex!
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/SharedArrays/src/SharedArrays.jl:512
  [3] macro expansion
    @ ~/test_par_prjct/par_test_script.jl:36 [inlined]
  [4] #1
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:303
  [5] #178
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:83
  [6] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
  [7] invokelatest
    @ ./essentials.jl:889
  [8] #107
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:283
  [9] run_work_thunk
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:70
 [10] run_work_thunk
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:79
 [11] #100
    @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:88

...and 4 more exceptions.

Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:448
 [2] macro expansion
   @ ./task.jl:480 [inlined]
 [3] (::Distributed.var"#177#179"{var"#1#2", UnitRange{Int64}})()
   @ Distributed /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:278
ERROR: LoadError: TaskFailedException

    nested task error: On worker 2:
    BoundsError: attempt to access 0-element Vector{Int64} at index [1]
    Stacktrace:
      [1] setindex!
        @ ./array.jl:1021 [inlined]
      [2] setindex!
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/SharedArrays/src/SharedArrays.jl:512
      [3] macro expansion
        @ ~/test_par_prjct/par_test_script.jl:36 [inlined]
      [4] #1
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:303
      [5] #178
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:83
      [6] #invokelatest#2
        @ ./essentials.jl:892 [inlined]
      [7] invokelatest
        @ ./essentials.jl:889
      [8] #107
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:283
      [9] run_work_thunk
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:70
     [10] run_work_thunk
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:79
     [11] #100
        @ /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:88
    
    ...and 4 more exceptions.
    
    Stacktrace:
     [1] sync_end(c::Channel{Any})
       @ Base ./task.jl:448
     [2] macro expansion
       @ ./task.jl:480 [inlined]
     [3] (::Distributed.var"#177#179"{var"#1#2", UnitRange{Int64}})()
       @ Distributed /software/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-13.2.0/julia-1.10.2-4md6o2sitswrvm6wlfiaa4llylglc2rq/share/julia/stdlib/v1.10/Distributed/src/macros.jl:278
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:448
 [2] macro expansion
   @ task.jl:480 [inlined]
 [3] top-level scope
   @ ~/test_par_prjct/par_test_script.jl:478
in expression starting at /test_par_prjct/par_test_script.jl:35
Number of workers: 5
Worker ID: 3
node_01
Worker ID: 2
Worker ID: 4
node_01
node_01
Worker ID: 5
node_01
Worker ID: 6
node_01


This Julia script does not cause an error. It is identical up to adding the workers manually instead of using SlurmManager():

using Distributed, SharedArrays, SlurmClusterManager

# Add local workers
addprocs(5)
println("Number of workers: ", nworkers())

@everywhere begin
    using SharedArrays
    
    m = SharedArray{Int}(2)
    m[1] = 1
    m[2] = 2

    function foo(a, m)
        println("Worker ID: $(myid())")
        println(gethostname())
        return sum(a .+ m)
    end
end

N = 10

# Shared array for result collection
output = SharedArray{Int}(N)

@sync @distributed for i in 1:N
    output[i] = foo(i, m)
end

display(output)
println("Finished Julia script")

The output here:

Number of workers: 5
      From worker 6:	Worker ID: 6
      From worker 6:	node_01
      From worker 6:	Worker ID: 6
      From worker 6:	node_01
      From worker 4:	Worker ID: 4
      From worker 4:	node_01
      From worker 4:	Worker ID: 4
      From worker 4:    node_01
      From worker 3:	Worker ID: 3
      From worker 3:	node_01
      From worker 3:	Worker ID: 3
      From worker 3:	node_01
      From worker 2:	Worker ID: 2
      From worker 2:	node_01
      From worker 2:	Worker ID: 2
      From worker 2:	node_01
      From worker 5:	Worker ID: 5
      From worker 5:	node_01
      From worker 5:	Worker ID: 5
      From worker 5:	node_01
10-element SharedVector{Int64}:
  5
  7
  9
 11
 13
 15
 17
 19
 21
 23
Finished Julia script

I can reproduce your exact error. I think the problem is here:

@sync @distributed for i in 1:N
    output[i] = foo(i, m)
end

Removing the output[i] fixed the issue for me.

julia> @sync @distributed for i in 1:N
            foo(i, m)
       end
Worker ID: 3
Worker ID: 3
Worker ID: 4
Worker ID: 4
Worker ID: 6
Worker ID: 6
Worker ID: 2
Worker ID: 2
Worker ID: 5
Worker ID: 5
Task (done) @0x000077c930b7c010

but I am not really sure why.

It seems to be a bug in SharedArrays

julia> fetch(@spawnat 1 output[1:end])
10-element Vector{Int64}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

julia> fetch(@spawnat 2 output[1:end]) # on the worker, shows 10 elements
10-element Vector{Int64}:
                   0
          8589934616
          4294967297
 1157425241673170952
                   1
          8589934608
          4294967297
  581245895626981384
                   1
                  32

julia> fetch(@spawnat 2 output[3])    # on the worker, but indexing fails. 
ERROR: On worker 2:
BoundsError: attempt to access 0-element Vector{Int64} at index [3]