Updating CUDA seems to have completely broken the package

Hareruya · February 12, 2024, 1:02pm

Salutations,

I am currently a PhD student and defended my thesis last month. Most of my Master’s research and all of my PhD research were developed using julia. I decided to update julia and run my old code at which point I found that most of it was broken. Upon investigation, I found that many basic functions involving CUDA.jl are not working on 1.10(CUDA.randn, fft, broadcasting) but seem to work fine on 1.9(perhaps an advertisement to juliaup might be in order, it really saved me).
As an example, the following code:

x = CUDA.randn(3, 1080, 1960)

Generates the stack trace

ERROR: MethodError: no method matching length(::Nothing)

Closest candidates are:
  length(::LLVM.StructTypeElementSet)
   @ LLVM ~/.julia/packages/LLVM/HykgZ/src/core/type.jl:236
  length(::Base.MethodSpecializations)
   @ Base reflection.jl:1166
  length(::ExponentialBackOff)
   @ Base error.jl:267
  ...

Stacktrace:
  [1] #s597#122
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:18 [inlined]
  [2] var"#s597#122"(f::Any, tt::Any, ::Any, job::Any)
    @ GPUCompiler ./none:0
  [3] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
  [4] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:71
  [5] cufunction(f::typeof(GPUArrays.getindex_kernel), tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{…}, CuDeviceArray{…}, Tuple{…}, UnitRange{…}}}; name::Nothing, always_inline::Bool, kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
  [6] cufunction
    @ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:299 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
  [8] #launch_heuristic#245
    @ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:17 [inlined]
  [9] launch_heuristic
    @ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:15 [inlined]
 [10] gpu_call(::typeof(GPUArrays.getindex_kernel), ::CuArray{…}, ::CuArray{…}, ::Tuple{…}, ::UnitRange{…}; target::CuArray{…}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:61
 [11] gpu_call
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:34 [inlined]
 [12] _getindex
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:48 [inlined]
 [13] getindex(A::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::UnitRange{Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:38
 [14] outofplace_pow2(shape::Tuple{Int64, Int64, Int64}, ctor::CUDA.CURAND.var"#143#145"{DataType, Tuple{Int64, Int64, Int64}}, f::CUDA.CURAND.var"#144#146"{@Kwargs{}, CUDA.CURAND.RNG})
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:167
 [15] randn(rng::CUDA.CURAND.RNG, T::Type{Float32}, dims::Tuple{Int64, Int64, Int64}; kwargs::@Kwargs{})
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175
 [16] randn
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175 [inlined]
 [17] randn
    @ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:184 [inlined]
 [18] randn(::Int64, ::Int64, ::Int64)
    @ CUDA ~/.julia/packages/CUDA/ZdCxS/src/random.jl:323
 [19] top-level scope
    @ REPL[4]:1
 [20] top-level scope
    @ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155
Some type information was truncated. Use `show(err)` to see complete types.

What leads me to believe this is a bug is that CUDA.rand does not cause the same problem. If the only function not to work was randn, there are many ways around it, but I have found that fft(though rfft works fine) and even simple broadcasting on a CUDA array cause similar issues. This severely hinders the usage of juliaGPU for me.

I was unfortunately away from my University machine for quite a while as I had to submit my thesis and slides to the board, make preparations for life post PhD and other such issues. I don’t know if this is an issue affecting many people or if it is specific to me and my lateness in updating. I wanted to know if anyone had similar issues and if so how they fixed them, or if reverting to 1.9 is the path here.

My julia release version is 1.10.0, juliaup is 1.14.5 and my University machine is an Arch Linux that has been properly maintained as it was used to write my presentation and thesis even as julia wasn’t actively used.

maleadt · February 12, 2024, 1:09pm

You are using an outdated version of GPUCompiler.jl.

Hareruya · February 12, 2024, 1:13pm

Thank you very much for the prompt response. I did update my packages right after installation and trying to update GPUCompiler.jl specifically does not seem to be working.

(@v1.10) pkg> update GPUCompiler
    Updating registry at `~/.julia/registries/General.toml`
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`

EDIT: Removing GPUCompiler and CUDA and then readding seems to have worked. Thank you very much! I’ll mark this as solved.

jlchan · April 14, 2024, 11:36pm

I’m running into the same error, but I’m using an older version CUDA.jl v3.13.1 (for compatibility with a K80). Can I ask what version of GPUCompiler you ended up using?

Topic		Replies	Views
CUDA on Julia 1.7 giving wrong results GPU	5	978	January 3, 2022
After update, CUDA.jl not working General Usage	0	320	April 15, 2021
CUDA with IJulia results in unexpected errors GPU	7	635	June 18, 2021
CUDA not working after today's update GPU	2	1795	April 16, 2021
Error adding CUDA on Windows 11 and Julia 1.10. It works with Julia 1.9 GPU cuda	8	375	February 6, 2024

Updating CUDA seems to have completely broken the package

Related topics