Any qualifier in CUDA.jl like `device` in CUDA/C++?

huiyuxie · April 25, 2024, 11:12pm

Hi all! I was wondering is there any qualifier in CUDA.jl like __device__ in CUDA/C++ to calling a function from device to make sure the function call is not falling back to CPU.

For example,

using CUDA

function gpu_add!(a, b, c)
    i = threadIdx().x
    a[i] = func(b[i], c[i])
    return
end

func is originally defined on CPU, will launching this gpu_add! kernel fall back to CPU when calling func? If it is going to fall back, is there any good macro equivalent to __device__ in CUDA/C++ to specifically restrict a function being running solely on GPU. Thanks!

Due to some reasons, there are so many cases like above in my project, but they are all numerical function (i.e., no way to parallelize them using parallel computing). If falling back is unavoidable, will it significantly affect performance? Thanks again!

Zentrik · April 25, 2024, 11:24pm

As far as I’m aware, in a kernel everything executes on the GPU.
I believe CUDA.jl has an internal macro @device_function that is the analogue of __device__.

huiyuxie · April 25, 2024, 11:33pm

Thanks! So as a user of CUDA.jl, can we assume that everything runs on the GPU within a kernel without worrying about falling back to the CPU in accident?

SteffenPL · April 26, 2024, 1:28am

I believe as long as you call a function with @cuda ... it will be fully compiled by the GPUCompiler.jl and hence any operations which would be infeasible on the GPU will lead directly to a compiler error. This includes function calls inside that function, which all will be compiled on the GPU as well.

(Note 100% sure, but maybe it is better to think of a function definition as something independent of the CPU or GPU, as Julia can compile different versions (methods) for a function depending on the inputs when the function is called. Therefore, it might happen that the function func will never be compiler for the CPU if it is only used inside the GPU kernel.)

vchuravy · April 26, 2024, 1:48am

Yes, everything called within a kernel will be executed on the GPU and if we are unable to compile it we will throw an error.

huiyuxie · April 26, 2024, 4:19pm

Thanks!

huiyuxie · April 26, 2024, 4:20pm

Thanks for the reply!

Topic		Replies	Views
How to write device code? GPU question	4	2080	September 4, 2020
CUDAdrv cannot find __host__ __device__ functions GPU	5	1706	July 29, 2018
Is it possible to use @static to distinguish CPU or GPU call in CUDA? GPU question	2	819	December 26, 2020
How to run many copies of a random function in parallel on GPU? GPU question	5	760	August 11, 2022
CUDA.jl unsupported call to an unknown function, unsupported dynamic function invocation GPU	3	1246	July 11, 2022

Any qualifier in CUDA.jl like `__device__` in CUDA/C++?

Related topics

Any qualifier in CUDA.jl like `device` in CUDA/C++?