Hi all! I was wondering is there any qualifier in CUDA.jl like __device__
in CUDA/C++ to calling a function from device to make sure the function call is not falling back to CPU.
For example,
using CUDA
function gpu_add!(a, b, c)
i = threadIdx().x
a[i] = func(b[i], c[i])
return
end
func
is originally defined on CPU, will launching this gpu_add!
kernel fall back to CPU when calling func
? If it is going to fall back, is there any good macro equivalent to __device__
in CUDA/C++ to specifically restrict a function being running solely on GPU. Thanks!
Due to some reasons, there are so many cases like above in my project, but they are all numerical function (i.e., no way to parallelize them using parallel computing). If falling back is unavoidable, will it significantly affect performance? Thanks again!
1 Like
As far as Iām aware, in a kernel everything executes on the GPU.
I believe CUDA.jl has an internal macro @device_function
that is the analogue of __device__
.
1 Like
Thanks! So as a user of CUDA.jl, can we assume that everything runs on the GPU within a kernel without worrying about falling back to the CPU in accident?
I believe as long as you call a function with @cuda ...
it will be fully compiled by the GPUCompiler.jl and hence any operations which would be infeasible on the GPU will lead directly to a compiler error. This includes function calls inside that function, which all will be compiled on the GPU as well.
(Note 100% sure, but maybe it is better to think of a function definition as something independent of the CPU or GPU, as Julia can compile different versions (methods) for a function depending on the inputs when the function is called. Therefore, it might happen that the function func
will never be compiler for the CPU if it is only used inside the GPU kernel.)
2 Likes
Yes, everything called within a kernel will be executed on the GPU and if we are unable to compile it we will throw an error.
1 Like