How to use distributed and pmap across GPU cores

Nadou407 · April 1, 2022, 8:16am

Hi all,

I’m still new to Julia GPU and Julia in general so got quite confused about how to do parallel computation using a GPU. Any help would be greatly appreciated!

My current code (a simple example) using 20 CPU cores is the following. Basically, I use distributed package to create CPU workers, send the needed data to each worker, use -pmap- to divide the whole job into batched pieces and distribute them to each worker, then collect them back.

I would like to take advantage of the large number of cores in a GPU (my V100 GPU should have 4000+ cores) but didn’t find enough references on how to do the transition…I would guess I need to make some CuArray somewhere but really confused about where to start, so thank you so much for any help and guidance!

using Distributed
N_worker = 20
addprocs(N_worker)
using DelimitedFiles
@everywhere using Distributions, Random, ParallelDataTransfer

Data = rand(10000,25) #in real application, this will be imported from a CSV file
sendto(workers(), Data = Data)

@everywhere function F(i)
    eps = rand(1)
    out = Data[i,1] + eps #this is silly but just an illustration of the real calculation (which will take some time per worker)
    return out
end

function parallel(NN)
    pool  = CachingPool(workers())
    f_obj = pmap(F, pool, 1:NN, batch_size = Int(ceil(NN/N_worker)))
    f_obj = hcat(f_obj...)
    return f_obj
end

result = parallel(10000)
writedlm("result.txt", [result])
rmprocs(workers())

lawless-m · April 1, 2022, 9:02am

While I am sure you can access the GPU from different processes, it will be considerably easier to work your problem out using a single CPU thread to initiate the computation on the multiple GPU cores.

So while learning the GPU side, I would abandon using Distributed

I have found KernelAbstractions.jl simple to get started

https://juliagpu.github.io/KernelAbstractions.jl/stable

maleadt · April 1, 2022, 1:04pm

See the introductory tutorial: Introduction · CUDA.jl. If possible, use array abstractions, and if you need to you can write custom kernels (either with CUDA.jl directly or using KernelAbstractions.jl).

Topic		Replies	Views
Pmap with multiple GPUs GPU	8	931	October 5, 2020
Is Pmap _both_ distributed and threaded? Performance multithreading , distributed , pmap	4	436	December 28, 2021
Distributed nested in Pmap Julia at Scale	2	1125	February 22, 2019
Lack of improvement from distributed pmap, understanding a simple example New to Julia distributed , pmap	6	145	October 29, 2024
Distributed parallel loops Julia at Scale parallel , distributed	0	380	December 2, 2023

How to use distributed and pmap across GPU cores

Related topics