How to use distributed and pmap across GPU cores

Hi all,

I’m still new to Julia GPU and Julia in general so got quite confused about how to do parallel computation using a GPU. Any help would be greatly appreciated!

My current code (a simple example) using 20 CPU cores is the following. Basically, I use distributed package to create CPU workers, send the needed data to each worker, use -pmap- to divide the whole job into batched pieces and distribute them to each worker, then collect them back.

I would like to take advantage of the large number of cores in a GPU (my V100 GPU should have 4000+ cores) but didn’t find enough references on how to do the transition…I would guess I need to make some CuArray somewhere but really confused about where to start, so thank you so much for any help and guidance!

using Distributed
N_worker = 20
addprocs(N_worker)
using DelimitedFiles
@everywhere using Distributions, Random, ParallelDataTransfer

Data = rand(10000,25) #in real application, this will be imported from a CSV file
sendto(workers(), Data = Data)

@everywhere function F(i)
    eps = rand(1)
    out = Data[i,1] + eps #this is silly but just an illustration of the real calculation (which will take some time per worker)
    return out
end

function parallel(NN)
    pool  = CachingPool(workers())
    f_obj = pmap(F, pool, 1:NN, batch_size = Int(ceil(NN/N_worker)))
    f_obj = hcat(f_obj...)
    return f_obj
end

result = parallel(10000)
writedlm("result.txt", [result])
rmprocs(workers())

While I am sure you can access the GPU from different processes, it will be considerably easier to work your problem out using a single CPU thread to initiate the computation on the multiple GPU cores.

So while learning the GPU side, I would abandon using Distributed

I have found KernelAbstractions.jl simple to get started

https://juliagpu.github.io/KernelAbstractions.jl/stable

1 Like

See the introductory tutorial: Introduction · CUDA.jl. If possible, use array abstractions, and if you need to you can write custom kernels (either with CUDA.jl directly or using KernelAbstractions.jl).

1 Like