CUDAnative: register host memory for pinned memory access

The register call in my example is cudaHostRegister, and the convert to a CuPtr does the cudaHostGetDevicePointer, so that should be enough for you to implement your application. I don’t have time to look into your CUDA C code myself; you should use the CUDA profiler to figure out what’s wrong. It has a trace API mode to print all API calls and see if there’s mismatches.

Also, AFAIK although cudaHostRegister gives you page-locked memory, using cudaHostGetDevicePointer for zero-copy memory will not yield high performance: it will make the GPU access host memory directly. You probably want an async memcpy to get DMA transfers.

I did the function register exactly copying in the code from your example and it works. That is no more the issue. The question is now just why the performance with Julia is not as expected for h2d transfers. Any one else could give advice?

Thanks,

Sam

I gave you some pointers, please analyze your code yourself with eg. nvprof. There are no straightforward answers at this point anymore, as the API interactions are mostly identical. Case in point, on my system the Julia version is faster. Just use the existing memcpy which should be much faster, why would you even want to implement this yourself?

@maleadt, first of all, I would like to thank you for the amazing work that you are doing with CUDAdrv / CUDAnative / CuArrays. I truly believe that with packages like yours Julia can enable a new era of supercomputing, where the “two language problem” can be solved and prototype and production code can become one and the same. I am fully aware that you are doing a herculean task developing and supporting in the same time these packages. So, when I ask other people for advice, it is in no way to express unhappiness with your support (which is BTW incredibly fast and efficient!), but rather to try to get other people involved in order to lower the load on you. Thanks again for everything. I will see if I can figure out something with nvprof.

Thanks. My comment wasn’t ill-intended, I just meant to say that for such a specific problem you probably can’t rely on other people to know what’s up (without them actually profiling the code). So it would be good to do a little digging first and report that here.

1 Like

When I try to run this example on Julia 1.6.2 using CUDA.jl > 3.4.0 I get the error message: “ERROR: Could not identify the buffer type; are you passing a valid CUDA pointer to unsafe_wrap?”. In lower versions it runs OK after replacing CUDAdrv by CUDA. What am I doing wrong?

1 Like

@_micro : I see that you opened a CUDA.jl bug after reply from @maleadt in Slack; so I am just linking this issue here:
https://github.com/JuliaGPU/CUDA.jl/issues/1125

1 Like