CUDAnative: register host memory for pinned memory access

Your use of register is confusing, do you want pinned memory and an async memcpy, or do you want to register an existing host pointer and map it into device space?

Here’s an example of the former:

julia> A = zeros(nx);

julia> A_cpuptr = pointer(A)
Ptr{Float64} @0x00007f360f7ff040

julia> A_buf = Mem.register(Mem.Host, A_cpuptr, sizeof(A), Mem.HOSTREGISTER_DEVICEMAP)
CUDAdrv.Mem.HostBuffer(Ptr{Nothing} @0x00007f360f7ff040, 8388608, CuContext(Ptr{Nothing} @0x000000000255dc70, false, true), true)

julia> A_gpuptr = convert(CuPtr{Float64}, A_buf)
CuPtr{Float64}(0x0000000202c40040)

julia> A_d = unsafe_wrap(CuArray, A_gpuptr, size(A));


# proof the devicemap works

julia> A[1] = 42
42

julia> A_d[1]
42.0

A_d is now a device array bound to a CPU memory allocation. Accessing that memory from the GPU is pretty expensive though, since it incurs PCIE reads.