Profiling Julia CUDA code missing 'CUDA HW'

I’m attempting to profile a custom CUDA kernel using NVIDIA’s nsight-systems (ie. nsys), as recommended by the CUDA.jl project.

nsys correctly gives me profiling results for nvcc compiled code, however when I run it against Julia, I only see CPU thread entries. It is missing the CUDA HW entry in the timeline where I am expecting to see information about the kernel execution, memory copies, etc.

Has anyone else experienced this, and is there a fix?

Are you actually profiling GPU code? Can you try with a simple CUDA.@profile CuArray([1]) .+ 1 or so?

Yes I believe I am. I’ve reduced the minimum code to this:

nsys profile julia -e 'using CUDA; arr = rand(100); arrd = CuArray(arr); map(arrd) do x; sin(x); end; println(Array(arrd))'

Actually, if I force --cuda-memory-usage=true, I do in fact get a “CUDA HW” header, but it does not include any information about the CUDA threads. See attached screenshot for an example.

Strange. Are there actual CUDA API calls in there? Can you share the .nsys-rep file?

Sure please see the .nsys-rep file here:

Kernel launches are definitely there, but the CUDA profiling might have not been started correctly. is suspicious. Maybe try upgrading to the latest NSight.

Also, which version of CUDA.jl? Could you post the output of running under JULIA_DEBUG=CUDA?

I can confirm its happening with other non-Julia code. So this would appear to be a system issue, rather than Julia specific. I’ll try to get one of the HPC staff onto it.

Thanks for your help @maleadt Tim!