Hi all,

I’m still new to Julia GPU and Julia in general so got quite confused about how to do parallel computation using a GPU. Any help would be greatly appreciated!

My current code (a simple example) using 20 CPU cores is the following. Basically, I use distributed package to create CPU workers, send the needed data to each worker, use -pmap- to divide the whole job into batched pieces and distribute them to each worker, then collect them back.

I would like to take advantage of the large number of cores in a GPU (my V100 GPU should have 4000+ cores) but didn’t find enough references on how to do the transition…I would guess I need to make some CuArray somewhere but really confused about where to start, so thank you so much for any help and guidance!

```
using Distributed
N_worker = 20
addprocs(N_worker)
using DelimitedFiles
@everywhere using Distributions, Random, ParallelDataTransfer
Data = rand(10000,25) #in real application, this will be imported from a CSV file
sendto(workers(), Data = Data)
@everywhere function F(i)
eps = rand(1)
out = Data[i,1] + eps #this is silly but just an illustration of the real calculation (which will take some time per worker)
return out
end
function parallel(NN)
pool = CachingPool(workers())
f_obj = pmap(F, pool, 1:NN, batch_size = Int(ceil(NN/N_worker)))
f_obj = hcat(f_obj...)
return f_obj
end
result = parallel(10000)
writedlm("result.txt", [result])
rmprocs(workers())
```