CUDA.jl kernel is half as fast as c++ Kernel

maleadt · September 22, 2022, 8:30pm

Yeah, it’s doable. You should inspect the LLVM IR to find out if there’s still any exceptions or bad conversions lurking. For example, it happens easily that you’re accidentally promoting to Int64 or Float64, inflating register usage. Have a look at https://github.com/JuliaComputing/Training/blob/master/AdvancedGPU/2-2-kernel_analysis_optimization.ipynb; you can inspect the number of registers by compiling the kernel with launch=false and calling CUDA.registers on it.

Topic		Replies	Views
Julia vs C++ speed General Usage performance , c	21	4769	September 2, 2021
Cosine seems slow Performance	14	1878	November 27, 2019
Why is my kernel as slow in FP32 as in FP64 on A2000 Ada-based GPU? New to Julia gpu , cuda , float , kernel , cudajl	10	269	March 11, 2025
Trying to understand low performance compared to C++ Performance	13	426	October 2, 2024
Trig functions very slow Performance	67	7180	October 10, 2018

CUDA.jl kernel is half as fast as c++ Kernel

Related topics