As the backtrace is trying to tell you CUDA.jl is wrapping your allocation as a “CUDA.HostMemory” since you are passing it as a Ptr
. But cudaMalloc
returns a device pointer.
I see from the comments that you tried CuPtr{Float32}
? That is the right type to use here. You might need to do reinterpret(CuPtr{Float32}, device_address)
,
Or you can just write:
function get_device_address()
return ccall((:get_device_address, "libcfunction"), CuPtr{Float32}, ())
end
GitHub - omlins/libdiffusion: Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics might be a good reference.