Hi,
First CUDA.jl is simply amazing and really to use, kudos to you all!
Now in my problem I have multiple GPUs and I would like to parallelize the training of my networks in parallel. So basically have one training of a model per GPU.
Oh ok, I thought that the code to run needed to be in the remote_call_wait function.
Now I face another issue. When running the code from the docs, I get :
[ Info: Worker 2 uses CuDevice(0)
ERROR: LoadError: On worker 2:
UndefRefError: access to undefined reference
getindex at ./array.jl:809 [inlined]
context at /home/theo/.julia/packages/CUDA/dZvbp/src/state.jl:242 [inlined]
device! at /home/theo/.julia/packages/CUDA/dZvbp/src/state.jl:286
device! at /home/theo/.julia/packages/CUDA/dZvbp/src/state.jl:265 [inlined]
#32 at /home/theo/experiments/ParticleFlow/julia/scripts/run_swag.jl:17
#110 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:309
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:88
#96 at ./task.jl:356
Stacktrace:
[1] (::Base.var"#770#772")(::Task) at ./asyncmap.jl:178
[2] foreach(::Base.var"#770#772", ::Array{Any,1}) at ./abstractarray.jl:2009
[3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}) at ./asyncmap.jl:178
[4] wrap_n_exec_twice at ./asyncmap.jl:154 [inlined]
[5] async_usemap(::var"#31#33", ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}; ntasks::Int64, batch_size::Nothing) at ./asyncmap.jl:103
[6] #asyncmap#754 at ./asyncmap.jl:81 [inlined]
[7] asyncmap(::Function, ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}) at ./asyncmap.jl:8
Note that on my machine I only have one GPU but that I face the same error on the cluster with 8 GPUs.
I can’t reproduce the pmap issue, but let’s keep that discussion in the issue.
For the other error: you’re running out of GPU memory, nothing we can do about that. Same for the CUBLAS initialization error you reported on Slack, that probably happens because of high memory pressure (https://github.com/JuliaGPU/CUDA.jl/issues/340).
I did manage to reproduce the original issue, fixed here: https://github.com/JuliaGPU/CUDA.jl/pull/471. This only occurs when passing a CuDevice to a new process, so doesn’t have any other impact. As a workaround, calling context() as you did is sufficient.