CUDA.jl kernel is half as fast as c++ Kernel

fft · September 26, 2022, 1:47am

Seems something is not working as you believe. My kernel is quite a bit faster if I have two separate calls to sin and cos instead of cis. Furthermore, when using cis there are references to double in the IR that aren’t there when using sin and cos.

I think i kind of understand what you are saying about about range reduction, but that doesn’t quite make sense to me. If I’m doing my computation using Float32 then I will accept the accuracy issues by doing so. Transparently converting to higher precision and back behind the scenes seems bad. Especially when using co-processors like with cuda kernels.

interesting thought about making my own kernel for sin and cos, but that seems a little overkill and kind of silly to have to do that to get the same performance as c++.

thanks for your thoughtful replies!

Topic		Replies	Views
Julia vs C++ speed General Usage performance , c	21	4769	September 2, 2021
Cosine seems slow Performance	14	1878	November 27, 2019
Why is my kernel as slow in FP32 as in FP64 on A2000 Ada-based GPU? New to Julia gpu , cuda , float , kernel , cudajl	10	269	March 11, 2025
Trying to understand low performance compared to C++ Performance	13	425	October 2, 2024
Trig functions very slow Performance	67	7178	October 10, 2018

CUDA.jl kernel is half as fast as c++ Kernel

Related topics