Calling a CUDA kernel in an external library

Niclas_Wiberg · September 12, 2020, 10:47am

Is it possible to call a precompiled CUDA kernel from Julia?

I am writing CUDA kernels in C++ and compiling with nvcc, for use in a C++ application. I use Julia to test the code. Currently I write host-side wrapping functions that I call from Julia using ccall. I would like to avoid the wrapping functions and call the kernels directly from Julia.

Is this possible?

maleadt · September 13, 2020, 10:03am

If you can compile to a PTX file (nvcc -pxt), you can load the file with CuModule, look up a kernel, and cudacall it. So pretty low level. It should also be possible to extract the PTX code from the fatbin, but we don’t have any functionality for that.

Niclas_Wiberg · September 14, 2020, 2:54pm

Thanks, I might try that. Do you know if the low-level code is generated using Nvidias ptxas in that case, or with some other technique?

I am also considering using cudaLaunchKernel(…) from the CUDA runtime, which might allow me to use an already compiled kernel. Do you see any pitfalls with this?

maleadt · September 14, 2020, 3:03pm

That won’t work, it expects a certain layout of the binary in order to look up the compiled function. The Julia binary doesn’t have that, and we don’t emit our code like that. Unless you want to call cudaLaunchKernel from a pre-compiled binary, but then you’re in the same boat (having to ccall an external library).

Niclas_Wiberg · September 14, 2020, 3:23pm

Hmm, I’m confused.

A small example:

__global__
void example_kernel(int *data)
{
    data[3] = data[0] + data[1] * data[2];
}

extern "C"
void example_wrapper(int *data)
{
    int *data_cuda;
    size_t size = 4 * sizeof(int);
    cudaMalloc(&data_cuda, size);
    cudaMemcpy(data_cuda, data, size, cudaMemcpyHostToDevice);
    example_kernel<<<1,1>>>(data_cuda);
    cudaDeviceSynchronize();
    cudaMemcpy(data, data_cuda, size, cudaMemcpyDeviceToHost);
    cudaFree(data_cuda);
}

I compile this using nvcc -Xcompiler -fPIC --shared example_kernel.cu -o example_kernel.so
And then in Julia:

julia> data=Cint[1,2,3,0]
4-element Array{Int32,1}:
 1
 2
 3
 0

julia> ccall((:example_wrapper,"./example_kernel.so"),Cvoid,(Ptr{Cint},),pointer(data))

julia> data
4-element Array{Int32,1}:
 1
 2
 3
 7

The shared object example_kernel.so should now contain the kernel code in the format that the CUDA runtime expects. So I would guess that cudaLaunchKernel(...) should be able to launch it, if I’m able to invoke it from within Julia.

Am I wrong?

maleadt · September 14, 2020, 3:34pm

With what arguments? cudaLaunchKernel takes a function pointer, which is resolved within the executing application, and AFAIK depends on the executable having specific symbols and state set-up.

Niclas_Wiberg · September 14, 2020, 5:48pm

Fair point, I don’t know how to get that function pointer. Maybe I can create a single C function that does it for me. Will investigate and come back. Thanks for the feedback.

Topic		Replies	Views
Call libcuda cuLaunchKernel from Julia New to Julia cuda , c	2	131	January 5, 2025
How to run ptx code on CUDA from julia? General Usage gpu , cudanative , cuda	11	688	January 26, 2024
Call CUDA C code via ccall GPU	3	1688	May 6, 2017
Calling another library from CUDA.jl GPU question	1	303	December 25, 2023
Julia -> C function (Create thead) -> Julia CUDA kernel issue New to Julia multithreading , cuda	1	86	July 12, 2024

Calling a CUDA kernel in an external library

Related topics