I’m still new to Julia GPU and Julia in general so got quite confused about how to do parallel computation using a GPU. Any help would be greatly appreciated!
My current code (a simple example) using 20 CPU cores is the following. Basically, I use distributed package to create CPU workers, send the needed data to each worker, use -pmap- to divide the whole job into batched pieces and distribute them to each worker, then collect them back.
I would like to take advantage of the large number of cores in a GPU (my V100 GPU should have 4000+ cores) but didn’t find enough references on how to do the transition…I would guess I need to make some CuArray somewhere but really confused about where to start, so thank you so much for any help and guidance!
using Distributed N_worker = 20 addprocs(N_worker) using DelimitedFiles @everywhere using Distributions, Random, ParallelDataTransfer Data = rand(10000,25) #in real application, this will be imported from a CSV file sendto(workers(), Data = Data) @everywhere function F(i) eps = rand(1) out = Data[i,1] + eps #this is silly but just an illustration of the real calculation (which will take some time per worker) return out end function parallel(NN) pool = CachingPool(workers()) f_obj = pmap(F, pool, 1:NN, batch_size = Int(ceil(NN/N_worker))) f_obj = hcat(f_obj...) return f_obj end result = parallel(10000) writedlm("result.txt", [result]) rmprocs(workers())