Activating environment triggers error in CUDA

Hi all,

I ran into a strange issue today; it seems that activating an environment of mine would trigger an ArgumentError: Pass LowerSIMDLoop error in CUDA.

The full error message is at the end of this post. This environment contains a package I am developing. Although I can hardly come up with a short example that reproduces this error (as I have no idea what might be the cause), I’m willing to provide as much information as possible.

I appreciate any advice!

Julia versioninfo

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core™ i7-7700K CPU @ 4.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto

CUDA versioninfo

CUDA runtime 12.4, artifact installation
CUDA driver 12.2
NVIDIA driver 535.171.4

CUDA libraries:

  • CUBLAS: 12.4.5
  • CURAND: 10.3.5
  • CUFFT: 11.2.1
  • CUSOLVER: 11.6.1
  • CUSPARSE: 12.3.1
  • CUPTI: 22.0.0
  • NVML: 12.0.0+535.171.4

Julia packages:

  • CUDA: 5.3.5
  • CUDA_Driver_jll: 0.8.1+0
  • CUDA_Runtime_jll: 0.12.1+0

Toolchain:

  • Julia: 1.10.3
  • LLVM: 15.0.7

1 device:
0: NVIDIA GeForce GTX 1060 6GB (sm_61, 4.189 GiB / 6.000 GiB available)

Packages in this environment

(SP2T) pkg> st
Project SP2T v1.0.0-DEV
Status ~/Dropbox (ASU)/Code/Julia/SP2T/Project.toml
[35d6a980] ColorSchemes v3.25.0
[861a8166] Combinatorics v1.0.2
[31c24e10] Distributions v0.25.108
[e9467ef8] GLMakie v0.10.2
[2ab3a3ac] LogExpFunctions v0.3.27
[872c559c] NNlib v0.9.17
[92933f4c] ProgressMeter v1.10.0
[276daf66] SpecialFunctions v2.4.0
[2913bbd2] StatsBase v0.34.3
[37e2e46d] LinearAlgebra
[9a3f8284] Random

Full error message

ArgumentError: Pass LowerSIMDLoop is not a module pass
Stacktrace:
[1] add!(pm::LLVM.NewPMModulePassManager, pb::LLVM.PassBuilder, pass::LLVM.Interop.LowerSIMDLoopPass)
@ LLVM ~/.julia/packages/LLVM/ShACK/src/newpm/passes.jl:701
[2] add!(pm::LLVM.NewPMModulePassManager, pass::LLVM.Interop.LowerSIMDLoopPass)
@ LLVM ~/.julia/packages/LLVM/ShACK/src/newpm/passes.jl:728
[3] buildNewPMPipeline!(mpm::LLVM.NewPMModulePassManager, job::GPUCompiler.CompilerJob, opt_level::Int64)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:37
[4] buildNewPMPipeline!(mpm::LLVM.NewPMModulePassManager, job::GPUCompiler.CompilerJob)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:34
[5] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:24 [inlined]
[6] macro expansion
@ ~/.julia/packages/LLVM/ShACK/src/base.jl:98 [inlined]
[7] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:23 [inlined]
[8] macro expansion
@ ~/.julia/packages/LLVM/ShACK/src/base.jl:98 [inlined]
[9] optimize_newpm!(job::GPUCompiler.CompilerJob, mod::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:22
[10] optimize!(job::GPUCompiler.CompilerJob, mod::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/optim.jl:5
[11] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:344 [inlined]
[12] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[13] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:343 [inlined]
[14] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[15] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:316 [inlined]
[16] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
[17] emit_llvm
@ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
[18] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:134
[19] codegen
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
[20] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
[21] compile
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
[22] #1116
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/compilation.jl:247 [inlined]
[23] JuliaContext(f::CUDA.var"#1116#1119"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[24] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[25] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/B2Z5u/src/compiler/compilation.jl:246
[26] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
[27] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
[28] macro expansion
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:367 [inlined]
[29] macro expansion
@ ./lock.jl:267 [inlined]
[30] cufunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}}, typeof(/), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Int64}}, Int64}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:362
[31] cufunction
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:359 [inlined]
[32] macro expansion
@ ~/.julia/packages/CUDA/B2Z5u/src/compiler/execution.jl:112 [inlined]
[33] #launch_heuristic#1173
@ ~/.julia/packages/CUDA/B2Z5u/src/gpuarrays.jl:17 [inlined]
[34] launch_heuristic
@ ~/.julia/packages/CUDA/B2Z5u/src/gpuarrays.jl:15 [inlined]
[35] _copyto!
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:78 [inlined]
[36] copyto!
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:44 [inlined]
[37] copy
@ ~/.julia/packages/GPUArrays/OqrUV/src/host/broadcast.jl:29 [inlined]
[38] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.Mem.DeviceBuffer}, Nothing, typeof(/), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Int64}})
@ Base.Broadcast ./broadcast.jl:903
[39] top-level scope
@ REPL[4]:1
[40] top-level scope
@ ~/.julia/packages/CUDA/B2Z5u/src/initialization.jl:209

Can you post the version of all packages in use, ]st -m? Also ensure you’ve updated the packages in your environment.

Thanks for reaching out! Here is a list of all packages in use. I’ve just run
] up and everything seems to be up-to-date.

Full list

Project SP2T v1.0.0-DEV
Status ~/Dropbox (ASU)/Code/Julia/SP2T/Manifest.toml
[621f4979] AbstractFFTs v1.5.0
[1520ce14] AbstractTrees v0.4.5
[79e6a3ab] Adapt v4.0.4
[66dad0bd] AliasTables v1.1.3
[27a7e980] Animations v0.4.1
[a9b6321e] Atomix v0.1.0
[67c07d97] Automa v1.0.3
[13072b0f] AxisAlgorithms v1.1.0
[39de3d68] AxisArrays v0.4.7
[fa961155] CEnum v0.5.0
[49dc2e85] Calculus v0.5.1
[d360d2e6] ChainRulesCore v1.23.0
[a2cac450] ColorBrewer v0.4.0
[35d6a980] ColorSchemes v3.25.0
[3da002f7] ColorTypes v0.11.5
[c3611d14] ColorVectorSpace v0.10.0
[5ae59095] Colors v0.12.11
[861a8166] Combinatorics v1.0.2
[34da2185] Compat v4.15.0
[187b0558] ConstructionBase v1.5.5
[d38c429a] Contour v0.6.3
[9a962f9c] DataAPI v1.16.0
[864edb3b] DataStructures v0.18.20
[e2d170a0] DataValueInterfaces v1.0.0
[927a84f5] DelaunayTriangulation v1.0.3
[31c24e10] Distributions v0.25.108
[ffbed154] DocStringExtensions v0.9.3
[fa6b7ba4] DualNumbers v0.6.8
[4e289a0a] EnumX v1.0.4
[429591f6] ExactPredicates v2.2.8
[411431e0] Extents v0.1.2
[7a1cc6ca] FFTW v1.8.0
[5789e2e9] FileIO v1.16.3
[8fc22ac5] FilePaths v0.8.3
[48062228] FilePathsBase v0.9.21
[1a297f60] FillArrays v1.11.0
[53c48c17] FixedPointNumbers v0.8.5
[1fa38f19] Format v1.3.7
[b38be410] FreeType v4.1.1
[663a7486] FreeTypeAbstraction v0.10.3
[f7f18e0c] GLFW v3.4.1
[e9467ef8] GLMakie v0.10.2
[46192b85] GPUArraysCore v0.1.6
[cf35fbd7] GeoInterface v1.3.4
[5c1252a2] GeometryBasics v0.4.11
[3955a311] GridLayoutBase v0.11.0
[42e2da0e] Grisu v1.0.2
[34004b35] HypergeometricFunctions v0.3.23
[2803e5a7] ImageAxes v0.6.11
[c817782e] ImageBase v0.1.7
[a09fc81d] ImageCore v0.10.2
[82e4d734] ImageIO v0.6.8
[bc367c6b] ImageMetadata v0.9.9
[9b13fd28] IndirectArrays v1.0.0
[d25df0c9] Inflate v0.1.4
[a98d9a8b] Interpolations v0.15.1
[d1acc4aa] IntervalArithmetic v0.22.12
[8197267c] IntervalSets v0.7.10
[92d709cd] IrrationalConstants v0.2.2
[f1662d9f] Isoband v0.1.1
[c8e1da08] IterTools v1.10.0
[82899510] IteratorInterfaceExtensions v1.0.0
[692b3bcd] JLLWrappers v1.5.0
[682c06a0] JSON v0.21.4
[b835a17e] JpegTurbo v0.1.5
[63c18a36] KernelAbstractions v0.9.19
[5ab0869b] KernelDensity v0.6.9
[929cbde3] LLVM v7.1.0
[b964fa9f] LaTeXStrings v1.3.1
[8cdb02fc] LazyModules v0.3.1
[2ab3a3ac] LogExpFunctions v0.3.27
[1914dd2f] MacroTools v0.5.13
[ee78f7c6] Makie v0.21.2
[20f20a25] MakieCore v0.8.2
[dbb5928d] MappedArrays v0.4.2
[0a4f8689] MathTeXEngine v0.6.0
[7269a6da] MeshIO v0.4.11
[e1d29d7a] Missings v1.2.0
[66fc600b] ModernGL v1.1.7
[e94cdb99] MosaicViews v0.3.4
[872c559c] NNlib v0.9.17
[77ba4419] NaNMath v1.0.2
[f09324ee] Netpbm v1.1.1
[510215fc] Observables v0.5.5
[6fe1bfb0] OffsetArrays v1.14.0
[52e1d378] OpenEXR v0.3.2
[bac558e1] OrderedCollections v1.6.3
[90014a1f] PDMats v0.11.31
[f57f5aa1] PNGFiles v0.4.3
[19eb6ba3] Packing v0.5.0
[5432bcbf] PaddedViews v0.5.12
[69de0a69] Parsers v2.8.1
[eebad327] PkgVersion v0.3.3
[995b91a9] PlotUtils v1.4.1
[647866c9] PolygonOps v0.1.2
[aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[92933f4c] ProgressMeter v1.10.0
[43287f4e] PtrArrays v1.2.0
[4b34888f] QOI v1.0.0
[1fd47b50] QuadGK v2.9.4
[b3c3ace0] RangeArrays v0.3.2
[c84ed2f1] Ratios v0.4.5
[189a3867] Reexport v1.2.2
[05181044] RelocatableFolders v1.0.1
[ae029012] Requires v1.3.0
[79098fc4] Rmath v0.7.1
[5eaf0fd0] RoundingEmulator v0.2.1
[fdea26ae] SIMD v3.5.0
[6c6a2e73] Scratch v1.2.1
[65257c39] ShaderAbstractions v0.4.1
[992d4aef] Showoff v1.0.3
[73760f76] SignedDistanceFields v0.4.0
[699a6c99] SimpleTraits v0.9.4
[45858cf5] Sixel v0.1.3
[a2af1166] SortingAlgorithms v1.2.1
[276daf66] SpecialFunctions v2.4.0
[cae243ae] StackViews v0.1.1
[90137ffa] StaticArrays v1.9.4
[1e83bf80] StaticArraysCore v1.4.2
[82ae8749] StatsAPI v1.7.0
[2913bbd2] StatsBase v0.34.3
[4c63d2b9] StatsFuns v1.3.1
[09ab397b] StructArrays v0.6.18
[3783bdb8] TableTraits v1.0.1
[bd369af6] Tables v1.11.1
[62fd8b95] TensorCore v0.1.1
[731e570b] TiffImages v0.10.0
[3bb67fe8] TranscodingStreams v0.10.8
[981d1d27] TriplotBase v0.1.0
[1cfade01] UnicodeFun v0.4.1
[1986cc42] Unitful v1.20.0
[013be700] UnsafeAtomics v0.2.1
[d80eeb9a] UnsafeAtomicsLLVM v0.1.4
[efce3f68] WoodburyMatrices v1.0.0
[6e34b625] Bzip2_jll v1.0.8+1
[4e9b3aee] CRlibm_jll v1.0.1+0
[83423d85] Cairo_jll v1.18.0+2
[5ae413db] EarCut_jll v2.2.4+0
[2e619515] Expat_jll v2.6.2+0
[b22a6f82] FFMPEG_jll v6.1.1+0
[f5851436] FFTW_jll v3.3.10+0
[a3f928ae] Fontconfig_jll v2.13.96+0
[d7e528f0] FreeType2_jll v2.13.2+0
[559328eb] FriBidi_jll v1.0.14+0
[0656b61e] GLFW_jll v3.3.9+0
[78b55507] Gettext_jll v0.21.0+0
[7746bdde] Glib_jll v2.80.2+0
[3b182d85] Graphite2_jll v1.3.14+0
[2e76f6c2] HarfBuzz_jll v2.8.1+1
[905a6f67] Imath_jll v3.1.11+0
[1d5cc7b8] IntelOpenMP_jll v2024.1.0+0
[aacddb02] JpegTurbo_jll v3.0.3+0
[c1c5ebd0] LAME_jll v3.100.2+0
[dad2f222] LLVMExtra_jll v0.0.29+0
[1d63c593] LLVMOpenMP_jll v15.0.7+0
[dd4b983a] LZO_jll v2.10.2+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
[d4300ac3] Libgcrypt_jll v1.8.11+0
[7e76a0d4] Libglvnd_jll v1.6.0+0
[7add5ba3] Libgpg_error_jll v1.49.0+0
[94ce4f54] Libiconv_jll v1.17.0+0
[4b2f31a3] Libmount_jll v2.40.1+0
[38a345b3] Libuuid_jll v2.40.1+0
[856f044c] MKL_jll v2024.1.0+0
[e7412a2a] Ogg_jll v1.3.5+1
[18a262bb] OpenEXR_jll v3.2.4+0
[458c3c95] OpenSSL_jll v3.0.13+1
[efe28fd5] OpenSpecFun_jll v0.5.5+0
[91d4177d] Opus_jll v1.3.2+0
[30392449] Pixman_jll v0.43.4+0
[f50d1b31] Rmath_jll v0.4.2+0
[02c8fc9c] XML2_jll v2.12.7+0
[aed1982a] XSLT_jll v1.1.34+0
[4f6342f7] Xorg_libX11_jll v1.8.6+0
[0c0b7dd1] Xorg_libXau_jll v1.0.11+0
[935fb764] Xorg_libXcursor_jll v1.2.0+4
[a3789734] Xorg_libXdmcp_jll v1.1.4+0
[1082639a] Xorg_libXext_jll v1.3.6+0
[d091e8ba] Xorg_libXfixes_jll v5.0.3+4
[a51aa0fd] Xorg_libXi_jll v1.7.10+4
[d1454406] Xorg_libXinerama_jll v1.1.4+4
[ec84b674] Xorg_libXrandr_jll v1.5.2+4
[ea2f1a96] Xorg_libXrender_jll v0.9.11+0
[14d82f49] Xorg_libpthread_stubs_jll v0.1.1+0
[c7cfdc94] Xorg_libxcb_jll v1.15.0+0
[c5fb5394] Xorg_xtrans_jll v1.5.0+0
[9a68df92] isoband_jll v0.2.3+0
[a4ae2306] libaom_jll v3.9.0+0
[0ac62f75] libass_jll v0.15.1+0
[f638f0a6] libfdk_aac_jll v2.0.2+0
[b53b4c65] libpng_jll v1.6.43+1
[075b6546] libsixel_jll v1.10.3+0
[f27f6e37] libvorbis_jll v1.3.7+1
[1317d2d5] oneTBB_jll v2021.12.0+0
[1270edf5] x264_jll v2021.5.5+0
[dfaa095f] x265_jll v3.5.0+0
[0dad84c5] ArgTools v1.1.1
[56f22d72] Artifacts
[2a0f44e3] Base64
[8bf52ea8] CRC32c
[ade2ca70] Dates
[8ba89e20] Distributed
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching
[b77e0a4c] InteractiveUtils
[4af54fe1] LazyArtifacts
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.10.0
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays v1.10.0
[10745b16] Statistics v1.10.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.4.0+0
[e37daf67] LibGit2_jll v1.6.4+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.2+1
[14a3606d] MozillaCACerts_jll v2023.1.10
[4536629a] OpenBLAS_jll v0.3.23+4
[05823500] OpenLibm_jll v0.8.1+2
[efcefdf7] PCRE2_jll v10.42.0+1
[bea87d4a] SuiteSparse_jll v7.2.1+1
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.8.0+1
[8e850ede] nghttp2_jll v1.52.0+1
[3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use status --outdated -m

Quick update: I just reproduced the error in an empty environment. It seems that NNlib is the trigger. Reverting NNlib from v0.9.17 to v0.9.16 does not resolve it.

I don’t see CUDA.jl anywhere in your environment. The B2Z5u slug as reported in your error message also doesn’t correspond to any released version. And finally, you seem to be using LLVM 7.1, which isn’t supported by any released version of CUDA.jl yet. So I think you have a CUDA.jl#master somewhere in a toplevel environment, which is not a supported configuration. If you need to continue using that version, upgrade GPUCompiler to v0.26.5 to fix this specific error, but again know that it is due to your environment mixing unsupported versions of packages.

1 Like

Ah, I see; everything indeed works using LLVM.jl v6.6.3.

I am using CUDA.jl v5.3.5; it is not in my environment because I am writing an extension for my package, and CUDA.jl is a weak dependency. I guess I should figure out how to make my package aware of the compatibility requirements of weak dependencies. :thinking: Perhaps, as a temporary workaround, I can include LLVM.jl as a normal dependency and set the version upper bound to v6.6.3.

Thank you very much for your help!