How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

maleadt · May 29, 2020, 6:42pm

The @atomic macro bypasses some of the getindex functionality, and is a little fragile as a result. In this case, it fails because you’re indexing with an In32. Then it fails because you’re adding a Float32 value to a Float64 array. Using @atomic d[Int(j)] += Float32(Δαᵢ * Xᵢⱼ) works around both.

Topic		Replies	Views
Problem with GPU programming GPU cudanative , cuda	4	1057	September 13, 2019
Solves the linear system using CuArrays.jl GPU	3	1609	December 27, 2019
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	432	March 14, 2024
Sparse LU factorization on GPU GPU linearalgebra , factorization	12	493	November 2, 2024
Gradient of sum of singular values of a matrix with CUDA.jl GPU question , zygote	7	1012	June 8, 2022

How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Related topics