How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

msekino · May 31, 2020, 3:53pm

@maleadt I’m sorry, I’ve too early to select the answer as the solution, but I’m stuck again…

I implemented the code for my own model (i.e. Δαᵢ = 0.1 * xᵢ⊤d # dummy is replaced with an update equation). When the code executed on CPU, the result was fine. On the other hand, when the code executed on GPU with blocks=Int(floor(numsample/32)) threads=32 instead of threads=numsample, the numerical result went to extremely large values…

Now I want to ask the followings.

Can d::CuVector be shared by all blocks and threads?
- I expected that the reading d (at xᵢ⊤d += Xᵢⱼ * d[j]) is an atomic read. Therefore while one thread reading d[j], all other blocks and threads cannot alter or corrupt the value d[j] which the other threads are reading.
- I expected that the atomic updating d is reflected on all blocks and threads immediately after the update.
Or, d::CuVector is just a host representation, therefore d is no longer shared after copied to GPU memory. If so, are there any way to realize my expectations?

Topic		Replies	Views
Problem with GPU programming GPU cudanative , cuda	4	1057	September 13, 2019
Solves the linear system using CuArrays.jl GPU	3	1610	December 27, 2019
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	438	March 14, 2024
Sparse LU factorization on GPU GPU linearalgebra , factorization	12	503	November 2, 2024
Gradient of sum of singular values of a matrix with CUDA.jl GPU question , zygote	7	1020	June 8, 2022

How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Related topics