Failed to compile PTX code when using NSight on Win11

A = CUDA.rand(100,100)
sin.(A)

works for normally started Julia process but errors when started in NVIDIA NSight 22.4.1 on Windows 11

julia> sin.(A)
ERROR: Failed to compile PTX code (ptxas exited with code 3221225477)
ptxas info    : 252 bytes gmem
ptxas info    : Compiling entry function '_Z27julia_broadcast_kernel_605715CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE4_sinS4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EEEES6_' for 'sm_75'
ptxas info    : Function properties for _Z27julia_broadcast_kernel_605715CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE4_sinS4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EEEES6_
    40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 32 registers, 488 bytes cmem[0], 8 bytes cmem[2]
ptxas info    : Function properties for gpu_report_exception
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for gpu_signal_exception
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
If you think this is a bug, please file an issue and attach C:\Users\gonak\AppData\Local\Temp\jl_RwjgpUj8Hs.ptx
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:427
  [3] #224
    @ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
  [4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\gonak\.julia\packages\GPUCompiler\07qaN\src\driver.jl:76
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\gonak\.julia\packages\GPUCompiler\07qaN\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
  [8] cufunction
    @ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:292 [inlined]
  [9] macro expansion
    @ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
 [10] #launch_heuristic#248
    @ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
 [11] _copyto!
    @ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:63 [inlined]
 [12] copyto!
    @ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:46 [inlined]
 [13] copy
    @ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:37 [inlined]
 [14] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(sin), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast .\broadcast.jl:860
 [15] top-level scope
    @ REPL[11]:1
 [16] top-level scope
    @ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\initialization.jl:52

I had some troubles starting a Julia session with Nsight, which does not immediately crash, but following Why I am not getting anything in REPL while trying to use Nsight Systems for profiling? - #2 by JGranada I got at least that working.
Where should I file an issue? CUDA.jl or NVIDIA?

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 48 × AMD Ryzen Threadripper 3960X 24-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 48 virtual cores
Environment:
  JULIA_CUDA_NSYS = C:\Program Files\NVIDIA Corporation\Nsight Systems 2022.4.1\target-windows-x64\nsys.exe
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 512.59.0, for CUDA 11.6
CUDA driver 11.6

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.1
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+512.59
- CUDNN: missing
- CUTENSOR: missing

Toolchain:
- Julia: 1.8.0
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

Environment:
- JULIA_CUDA_NSYS: C:\Program Files\NVIDIA Corporation\Nsight Systems 2022.4.1\target-windows-x64\nsys.exe

1 device:
  0: NVIDIA GeForce RTX 2080 SUPER (sm_75, 6.254 GiB / 8.000 GiB available)

File a bug on CUDA.jl, I’ll have a look. If I can reproduce, you should file with NVIDIA.

1 Like