How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Thank you so much again!

The algorithm works, and the objective function monotonically descreases. However, the code ended up being slower than the CPU version… I’ve come to think that this algorithm is not suitable for GPUs.

Even so, because this was my first experience with CUDA, it was a great learning experience for me. I will try again with other algorithms.

Thanks, again!