How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Thanks! Now I’ve tried @inbounds according to your advice., next I will try shared memory. Now I understand that GPU specific optimization is necessary to get the true performance out of GPUs…