Updating CUDA seems to have completely broken the package

Salutations,

I am currently a PhD student and defended my thesis last month. Most of my Master’s research and all of my PhD research were developed using julia. I decided to update julia and run my old code at which point I found that most of it was broken. Upon investigation, I found that many basic functions involving CUDA.jl are not working on 1.10(CUDA.randn, fft, broadcasting) but seem to work fine on 1.9(perhaps an advertisement to juliaup might be in order, it really saved me).
As an example, the following code:

x = CUDA.randn(3, 1080, 1960)

Generates the stack trace

ERROR: MethodError: no method matching length(::Nothing)

Closest candidates are:
  length(::LLVM.StructTypeElementSet)
   @ LLVM ~/.julia/packages/LLVM/HykgZ/src/core/type.jl:236
  length(::Base.MethodSpecializations)
   @ Base reflection.jl:1166
  length(::ExponentialBackOff)
   @ Base error.jl:267
  ...

Stacktrace:
  [1] #s597#122
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:18 [inlined]
  [2] var"#s597#122"(f::Any, tt::Any, ::Any, job::Any)
    @ GPUCompiler ./none:0
  [3] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
  [4] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:71
  [5] cufunction(f::typeof(GPUArrays.getindex_kernel), tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{…}, CuDeviceArray{…}, Tuple{…}, UnitRange{…}}}; name::Nothing, always_inline::Bool, kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
  [6] cufunction
    @ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:299 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
  [8] #launch_heuristic#245
    @ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:17 [inlined]
  [9] launch_heuristic
    @ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:15 [inlined]
 [10] gpu_call(::typeof(GPUArrays.getindex_kernel), ::CuArray{…}, ::CuArray{…}, ::Tuple{…}, ::UnitRange{…}; target::CuArray{…}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:61
 [11] gpu_call
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:34 [inlined]
 [12] _getindex
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:48 [inlined]
 [13] getindex(A::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::UnitRange{Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:38
 [14] outofplace_pow2(shape::Tuple{Int64, Int64, Int64}, ctor::CUDA.CURAND.var"#143#145"{DataType, Tuple{Int64, Int64, Int64}}, f::CUDA.CURAND.var"#144#146"{@Kwargs{}, CUDA.CURAND.RNG})
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:167
 [15] randn(rng::CUDA.CURAND.RNG, T::Type{Float32}, dims::Tuple{Int64, Int64, Int64}; kwargs::@Kwargs{})
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175
 [16] randn
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175 [inlined]
 [17] randn
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:184 [inlined]
 [18] randn(::Int64, ::Int64, ::Int64)
    @ CUDA ~/.julia/packages/CUDA/ZdCxS/src/random.jl:323
 [19] top-level scope
    @ REPL[4]:1
 [20] top-level scope
    @ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155
Some type information was truncated. Use `show(err)` to see complete types.

What leads me to believe this is a bug is that CUDA.rand does not cause the same problem. If the only function not to work was randn, there are many ways around it, but I have found that fft(though rfft works fine) and even simple broadcasting on a CUDA array cause similar issues. This severely hinders the usage of juliaGPU for me.

I was unfortunately away from my University machine for quite a while as I had to submit my thesis and slides to the board, make preparations for life post PhD and other such issues. I don’t know if this is an issue affecting many people or if it is specific to me and my lateness in updating. I wanted to know if anyone had similar issues and if so how they fixed them, or if reverting to 1.9 is the path here.

My julia release version is 1.10.0, juliaup is 1.14.5 and my University machine is an Arch Linux that has been properly maintained as it was used to write my presentation and thesis even as julia wasn’t actively used.

You are using an outdated version of GPUCompiler.jl.

1 Like

Thank you very much for the prompt response. I did update my packages right after installation and trying to update GPUCompiler.jl specifically does not seem to be working.

(@v1.10) pkg> update GPUCompiler
    Updating registry at `~/.julia/registries/General.toml`
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`

EDIT: Removing GPUCompiler and CUDA and then readding seems to have worked. Thank you very much! I’ll mark this as solved.

1 Like

I’m running into the same error, but I’m using an older version CUDA.jl v3.13.1 (for compatibility with a K80). Can I ask what version of GPUCompiler you ended up using?