How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

The @atomic macro bypasses some of the getindex functionality, and is a little fragile as a result. In this case, it fails because you’re indexing with an In32. Then it fails because you’re adding a Float32 value to a Float64 array. Using @atomic d[Int(j)] += Float32(Δαᵢ * Xᵢⱼ) works around both.

1 Like