CUDA performing scalar indexing when used along with Distributed

A possible workaround could be to use a getter function:

using Distributed

addprocs(2)
@everywhere begin
    using CUDA
    x = CUDA.rand(10)
    get_x() = x
end

@sync @distributed for i = 1:2
    println(get_x())
end
#=
      From worker 2:    Float32[0.44315395, 0.8780446, 0.21944213, 0.36170566, 0.14836204, 0.11738869, 0.726818, 0.1946531, 0.09105217, 0.9457448]
      From worker 3:    Float32[0.32678527, 0.65252995, 0.19543259, 0.69162387, 0.9956036, 0.3051676, 0.86222124, 0.18076622, 0.9949689, 0.45308512]
Task (done) @0x000002496fed3a30
=#
1 Like