A = CUDA.rand(100,100)
sin.(A)
works for normally started Julia process but errors when started in NVIDIA NSight 22.4.1 on Windows 11
julia> sin.(A)
ERROR: Failed to compile PTX code (ptxas exited with code 3221225477)
ptxas info : 252 bytes gmem
ptxas info : Compiling entry function '_Z27julia_broadcast_kernel_605715CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE4_sinS4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EEEES6_' for 'sm_75'
ptxas info : Function properties for _Z27julia_broadcast_kernel_605715CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE4_sinS4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EEEES6_
40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 488 bytes cmem[0], 8 bytes cmem[2]
ptxas info : Function properties for gpu_report_exception
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for gpu_signal_exception
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
If you think this is a bug, please file an issue and attach C:\Users\gonak\AppData\Local\Temp\jl_RwjgpUj8Hs.ptx
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:427
[3] #224
@ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
[4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
@ GPUCompiler C:\Users\gonak\.julia\packages\GPUCompiler\07qaN\src\driver.jl:76
[5] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
[6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler C:\Users\gonak\.julia\packages\GPUCompiler\07qaN\src\cache.jl:90
[7] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
[8] cufunction
@ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:292 [inlined]
[9] macro expansion
@ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
[10] #launch_heuristic#248
@ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
[11] _copyto!
@ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:63 [inlined]
[12] copyto!
@ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:46 [inlined]
[13] copy
@ C:\Users\gonak\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:37 [inlined]
[14] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(sin), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}})
@ Base.Broadcast .\broadcast.jl:860
[15] top-level scope
@ REPL[11]:1
[16] top-level scope
@ C:\Users\gonak\.julia\packages\CUDA\DfvRa\src\initialization.jl:52
I had some troubles starting a Julia session with Nsight, which does not immediately crash, but following Why I am not getting anything in REPL while trying to use Nsight Systems for profiling? - #2 by JGranada I got at least that working.
Where should I file an issue? CUDA.jl
or NVIDIA?
julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 48 × AMD Ryzen Threadripper 3960X 24-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 1 on 48 virtual cores
Environment:
JULIA_CUDA_NSYS = C:\Program Files\NVIDIA Corporation\Nsight Systems 2022.4.1\target-windows-x64\nsys.exe
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 512.59.0, for CUDA 11.6
CUDA driver 11.6
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.1
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+512.59
- CUDNN: missing
- CUTENSOR: missing
Toolchain:
- Julia: 1.8.0
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
Environment:
- JULIA_CUDA_NSYS: C:\Program Files\NVIDIA Corporation\Nsight Systems 2022.4.1\target-windows-x64\nsys.exe
1 device:
0: NVIDIA GeForce RTX 2080 SUPER (sm_75, 6.254 GiB / 8.000 GiB available)