How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Your assignment to d might happen after some threads have read their value. You need calls to synchronize_thread to prevent that. Do note that’s only possible because you’re launching a single block – when using multiple blocks it is impossible to synchronize threads – which isn’t ideal for performance.

1 Like