Salutations,
I am currently a PhD student and defended my thesis last month. Most of my Master’s research and all of my PhD research were developed using julia. I decided to update julia and run my old code at which point I found that most of it was broken. Upon investigation, I found that many basic functions involving CUDA.jl are not working on 1.10(CUDA.randn, fft, broadcasting) but seem to work fine on 1.9(perhaps an advertisement to juliaup might be in order, it really saved me).
As an example, the following code:
x = CUDA.randn(3, 1080, 1960)
Generates the stack trace
ERROR: MethodError: no method matching length(::Nothing)
Closest candidates are:
length(::LLVM.StructTypeElementSet)
@ LLVM ~/.julia/packages/LLVM/HykgZ/src/core/type.jl:236
length(::Base.MethodSpecializations)
@ Base reflection.jl:1166
length(::ExponentialBackOff)
@ Base error.jl:267
...
Stacktrace:
[1] #s597#122
@ ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:18 [inlined]
[2] var"#s597#122"(f::Any, tt::Any, ::Any, job::Any)
@ GPUCompiler ./none:0
[3] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[4] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:71
[5] cufunction(f::typeof(GPUArrays.getindex_kernel), tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{…}, CuDeviceArray{…}, Tuple{…}, UnitRange{…}}}; name::Nothing, always_inline::Bool, kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
[6] cufunction
@ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:299 [inlined]
[7] macro expansion
@ ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
[8] #launch_heuristic#245
@ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:17 [inlined]
[9] launch_heuristic
@ ~/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:15 [inlined]
[10] gpu_call(::typeof(GPUArrays.getindex_kernel), ::CuArray{…}, ::CuArray{…}, ::Tuple{…}, ::UnitRange{…}; target::CuArray{…}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:61
[11] gpu_call
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/device/execution.jl:34 [inlined]
[12] _getindex
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:48 [inlined]
[13] getindex(A::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, I::UnitRange{Int64})
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:38
[14] outofplace_pow2(shape::Tuple{Int64, Int64, Int64}, ctor::CUDA.CURAND.var"#143#145"{DataType, Tuple{Int64, Int64, Int64}}, f::CUDA.CURAND.var"#144#146"{@Kwargs{}, CUDA.CURAND.RNG})
@ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:167
[15] randn(rng::CUDA.CURAND.RNG, T::Type{Float32}, dims::Tuple{Int64, Int64, Int64}; kwargs::@Kwargs{})
@ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175
[16] randn
@ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:175 [inlined]
[17] randn
@ CUDA.CURAND ~/.julia/packages/CUDA/ZdCxS/lib/curand/random.jl:184 [inlined]
[18] randn(::Int64, ::Int64, ::Int64)
@ CUDA ~/.julia/packages/CUDA/ZdCxS/src/random.jl:323
[19] top-level scope
@ REPL[4]:1
[20] top-level scope
@ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155
Some type information was truncated. Use `show(err)` to see complete types.
What leads me to believe this is a bug is that CUDA.rand does not cause the same problem. If the only function not to work was randn, there are many ways around it, but I have found that fft(though rfft works fine) and even simple broadcasting on a CUDA array cause similar issues. This severely hinders the usage of juliaGPU for me.
I was unfortunately away from my University machine for quite a while as I had to submit my thesis and slides to the board, make preparations for life post PhD and other such issues. I don’t know if this is an issue affecting many people or if it is specific to me and my lateness in updating. I wanted to know if anyone had similar issues and if so how they fixed them, or if reverting to 1.9 is the path here.
My julia release version is 1.10.0, juliaup is 1.14.5 and my University machine is an Arch Linux that has been properly maintained as it was used to write my presentation and thesis even as julia wasn’t actively used.