Julia with CuArray issue

As the backtrace is trying to tell you CUDA.jl is wrapping your allocation as a “CUDA.HostMemory” since you are passing it as a Ptr. But cudaMalloc returns a device pointer.

I see from the comments that you tried CuPtr{Float32}? That is the right type to use here. You might need to do reinterpret(CuPtr{Float32}, device_address),

Or you can just write:

function get_device_address()
    return ccall((:get_device_address, "libcfunction"), CuPtr{Float32}, ())
end

GitHub - omlins/libdiffusion: Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics might be a good reference.