Declaration of CUDA variables in module results in undefined reference

Hi everyone,

I am working on a project that makes intensive use of GPU computation and we wanted to have some module constants that were already placed on the GPU memory, but apparently the memory is freed after the module has been defined and accessing that constant results in an undefined reference error (if you access from CPU) or non-determined data (when accessed from a GPU kernel).

I tried to replicate it in the REPL but it doesn’t happen there:

module CUDAtest
using CUDA
const A = Vector{Float32}([1f0, 2f0, 3f0])
const B = CuArray(A)

In this case I can access both CUDAtest.A and CUDAtest.B without issues, but when I try to access constants in my other module I can only have the one located in CPU. About the constant located in GPU, I can query its type, size and length (which indicates that the pointer is there), but trying to print it throws the following error:

julia> Common.CU_BLUE
5-element CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
Error showing value of type CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
ERROR: UndefRefError: access to undefined reference

With CPU arrays it works all right, so do you know if this is somewhat an expected behaviour or maybe it is a bug?

Thanks in advance,

CPU arrays can get serialized in the precompilation image, but with GPU arrays a new context is created when the process starts. So you essentially cannot have global arrays in a module, you’ll need to initialize them at the start of your application.

Thanks for your answer. It makes sense, but it is somewhat inconsistent with the general behaviour, and also deceptive from the user’s POV. I would rather have an error/warning than this. Another option might be serializing a CPU version of the array and initializing it automatically upon the loading of the module, although that will require significantly more effort.

The problem is that there’s AFAIK no methods that are being called to (de)serialize these objects, so there’s no clear-cut point where to throw an error. I haven’t taken a proper look at the issue though, so if you have any thoughts feel free to open an issue with details, or a PR if you have the time.

1 Like