How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

maleadt · June 1, 2020, 1:34pm

You will need to optimize a little given the GPUs architecture, e.g., here your kernel seems to be loading a lot of identical values in each thread, which you better load only from in a single thread and cache in shared memory, Also be sure to use @inbounds where possible, as bounds checking branches are much more expensive on the GPU.
Generally, only few embarrassingly parallel algorithms get easy speed-ups when parallelizing them like that, and even then it often depends on the memory pressure and arithmetic intensity. Beyond that, you will need to optimize for the architecture.

Topic		Replies	Views
Problem with GPU programming GPU cudanative , cuda	4	1057	September 13, 2019
Solves the linear system using CuArrays.jl GPU	3	1609	December 27, 2019
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	433	March 14, 2024
Sparse LU factorization on GPU GPU linearalgebra , factorization	12	495	November 2, 2024
Gradient of sum of singular values of a matrix with CUDA.jl GPU question , zygote	7	1013	June 8, 2022

How to parallerize dual coordinet descent mehods on GPU using CUDA.jl?

Related topics