Hi all,
I ran into a strange issue today; it seems that activating an environment of mine would trigger an ArgumentError: Pass LowerSIMDLoop
error in CUDA.
The full error message is at the end of this post. This environment contains a package I am developing. Although I can hardly come up with a short example that reproduces this error (as I have no idea what might be the cause), I’m willing to provide as much information as possible.
I appreciate any advice!
Julia versioninfo
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core™ i7-7700K CPU @ 4.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
CUDA versioninfo
CUDA runtime 12.4, artifact installation
CUDA driver 12.2
NVIDIA driver 535.171.4
CUDA libraries:
- CUBLAS: 12.4.5
- CURAND: 10.3.5
- CUFFT: 11.2.1
- CUSOLVER: 11.6.1
- CUSPARSE: 12.3.1
- CUPTI: 22.0.0
- NVML: 12.0.0+535.171.4
Julia packages:
- CUDA: 5.3.5
- CUDA_Driver_jll: 0.8.1+0
- CUDA_Runtime_jll: 0.12.1+0
Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7
1 device:
0: NVIDIA GeForce GTX 1060 6GB (sm_61, 4.189 GiB / 6.000 GiB available)
Packages in this environment
(SP2T) pkg> st
Project SP2T v1.0.0-DEV
Status ~/Dropbox (ASU)/Code/Julia/SP2T/Project.toml
[35d6a980] ColorSchemes v3.25.0
[861a8166] Combinatorics v1.0.2
[31c24e10] Distributions v0.25.108
[e9467ef8] GLMakie v0.10.2
[2ab3a3ac] LogExpFunctions v0.3.27
[872c559c] NNlib v0.9.17
[92933f4c] ProgressMeter v1.10.0
[276daf66] SpecialFunctions v2.4.0
[2913bbd2] StatsBase v0.34.3
[37e2e46d] LinearAlgebra
[9a3f8284] Random
Full error message
ArgumentError: Pass LowerSIMDLoop is not a module pass
Stacktrace:
[1] add!(pm::LLVM.NewPMModulePassManager, pb::LLVM.PassBuilder, pass::LLVM.Interop.LowerSIMDLoopPass)
@ LLVM ~/.julia/packages/LLVM/ShACK/src/newpm/passes.jl:701
[2] add!(pm::LLVM.NewPMModulePassManager, pass::LLVM.Interop.LowerSIMDLoopPass)
@ LLVM ~/.julia/packages/LLVM/ShACK/src/newpm/passes.jl:728
[3] buildNewPMPipeline!(mpm::LLVM.NewPMModulePassManager, job::GPUCompiler.CompilerJob, opt_level::Int64)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:37
[4] buildNewPMPipeline!(mpm::LLVM.NewPMModulePassManager, job::GPUCompiler.CompilerJob)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:34
[5] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:24 [inlined]
[6] macro expansion
@ ~/.julia/packages/LLVM/ShACK/src/base.jl:98 [inlined]
[7] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:23 [inlined]
[8] macro expansion
@ ~/.julia/packages/LLVM/ShACK/src/base.jl:98 [inlined]
[9] optimize_newpm!(job::GPUCompiler.CompilerJob, mod::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:22
[10] optimize!(job::GPUCompiler.CompilerJob, mod::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:5
[11] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:344 [inlined]
[12] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[13] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:343 [inlined]
[14] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[15] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:316 [inlined]
[16] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
[17] emit_llvm
@ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
[18] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:134
[19] codegen
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
[20] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
[21] compile
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
[22] #1116
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/compilation.jl:247 [inlined]
[23] JuliaContext(f::CUDA.var"#1116#1119"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[24] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[25] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/B2Z5u/src/compiler/compilation.jl:246
[26] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
[27] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
[28] macro expansion
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:367 [inlined]
[29] macro expansion
@ ./lock.jl:267 [inlined]
[30] cufunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}}, typeof(/), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Int64}}, Int64}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:362
[31] cufunction
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:359 [inlined]
[32] macro expansion
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:112 [inlined]
[33] #launch_heuristic#1173
@ ~/.julia/packages/CUDA/B2Z5u/src/gpuarrays.jl:17 [inlined]
[34] launch_heuristic
@ ~/.julia/packages/CUDA/B2Z5u/src/gpuarrays.jl:15 [inlined]
[35] _copyto!
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:78 [inlined]
[36] copyto!
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:44 [inlined]
[37] copy
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:29 [inlined]
[38] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.Mem.DeviceBuffer}, Nothing, typeof(/), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Int64}})
@ Base.Broadcast ./broadcast.jl:903
[39] top-level scope
@ REPL[4]:1
[40] top-level scope
@ ~/.julia/packages/CUDA/B2Z5u/src/initialization.jl:209