Failed to profile CUDA.jl with Nsight Systems 2024.1.1

I am following the doc to learn profiling GPU in Julia. When I followed
$ nsys launch julia
It started a Julia terminal but then failed when I run
using CUDA
and return errors:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\ddt00\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at REPL[1]:1
nvtxGlobals_v3 at C:\Users\ddt00\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 1476205 (Pool: 1475266; Big: 939); GC: 2

My CUDA settings are:

julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 546.12.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+546.12

Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6

Environment:
- JULIA_CUDA_NSYS: C:\Program Files\NVIDIA Corporation\Nsight Systems 2024.1.1\target-windows-x64\nsys.exe

1 device:
  0: NVIDIA GeForce GTX 1070 with Max-Q Design (sm_61, 7.886 GiB / 8.000 GiB available)

An issue exists already, Crash on Windows · Issue #37 · JuliaGPU/NVTX.jl · GitHub. If you’re familiar with Windows development, it would be great to chime in there.

Hi @maleadt

Sorry that I’m not familiar with it. Wish someone could solve it and it would be a great help for the development.

I also tried nsight computing. It works for code of several line but it got stuck if I want to test a large project (containing mixed use of both CPU and GPU code). Is there any alternative way I can do for profiling in Julia right now? Or can I only profile my GPU code, which is just a small portion of my code in the project?

If you need to profile both CPU and CUDA GPU code, NSight Systems is really the tool to use. Maybe you could try using Linux, or WSL?

Thanks! I would have a try in Linux.