CUDA performing scalar indexing when used along with Distributed

eldee · September 17, 2024, 6:36pm

A possible workaround could be to use a getter function:

using Distributed

addprocs(2)
@everywhere begin
    using CUDA
    x = CUDA.rand(10)
    get_x() = x
end

@sync @distributed for i = 1:2
    println(get_x())
end
#=
      From worker 2:    Float32[0.44315395, 0.8780446, 0.21944213, 0.36170566, 0.14836204, 0.11738869, 0.726818, 0.1946531, 0.09105217, 0.9457448]
      From worker 3:    Float32[0.32678527, 0.65252995, 0.19543259, 0.69162387, 0.9956036, 0.3051676, 0.86222124, 0.18076622, 0.9949689, 0.45308512]
Task (done) @0x000002496fed3a30
=#

Topic		Replies	Views
GPU: Scalar indexing in kernel programming GPU cuda	2	288	June 5, 2023
Overcoming Slow Scalar Operations on GPU Arrays GPU performance	19	6450	January 4, 2021
Julia pmap how to write each worker into separate index using parallel computing Performance question	16	1455	November 30, 2022
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5323	January 4, 2021
Distributed: Passing views of an array for read access to workers (using pmap) General Usage question , performance , parallel , distributed , views	8	551	January 31, 2024

CUDA performing scalar indexing when used along with Distributed

Related topics