InvalidIRError on rand! when using Flux and MLDatasets

I recently updated my system and Julia and now the following program fails (it didn’t fail before the updates):

using Flux
using MLDatasets

using CUDA
using Random
a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a)

I get this error:

InvalidIRError: compiling kernel rand!(CuDeviceVector{Float32, 1}, UInt32, UInt32) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to CUDA.Philox2x32{R}() where R in CUDA at ~/.julia/packages/CUDA/AHr5I/src/device/random.jl:46)
Stacktrace:
 [1] Philox2x32
   @ ~/.julia/packages/CUDA/AHr5I/src/device/random.jl:62
 [2] #default_rng
   @ ~/.julia/packages/CUDA/AHr5I/src/device/random.jl:95
 [3] kernel
   @ ~/.julia/packages/CUDA/AHr5I/src/random.jl:39
Reason: unsupported dynamic function invocation (call to rand(rng::AbstractRNG, ::Type{X}) where X in Random at /usr/share/julia/stdlib/v1.7/Random/src/Random.jl:257)
Stacktrace:
 [1] kernel
   @ ~/.julia/packages/CUDA/AHr5I/src/random.jl:51
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#356", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/validation.jl:124
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:386 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LDL7n/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:384 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/AHr5I/src/compiler/execution.jl:332
  [7] #260
    @ ~/.julia/packages/CUDA/AHr5I/src/compiler/execution.jl:325 [inlined]
  [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#356", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/AHr5I/src/compiler/execution.jl:324
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/cache.jl:90
 [11] cufunction(f::CUDA.var"#kernel#356", tt::Type{Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/AHr5I/src/compiler/execution.jl:297
 [12] macro expansion
    @ ~/.julia/packages/CUDA/AHr5I/src/compiler/execution.jl:102 [inlined]
 [13] rand!(rng::CUDA.RNG, A::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ CUDA ~/.julia/packages/CUDA/AHr5I/src/random.jl:60
 [14] top-level scope
    @ ~/tmp/test.jl:7
in expression starting at ~/tmp/test.jl:7

If I remove using Flux, using MLDatasets or both then it works. The error only appears if I use both Flux and MLDatasets.

And if I run the Random.rand! command before and after these two using then it also works without any error.

I don’t understand what’s happening, and I don’t know if it’s a bug in CUDA, Flux or MLDatasets. Where should I report that?

For now I’ll just run the rand! command before everything else in my code, but a real fix would be better.

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
NVIDIA driver 510.68.2, for CUDA 11.6
CUDA driver 11.6

Libraries: 
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 11.0.0+510.68.2
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.7.2
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce GTX 1080 (sm_61, 3.941 GiB / 8.000 GiB available)

I also tried with CUDA master and the problem is still there.

That’s interesting. There’s a couple of know Julia ‘issues’ that may relate to this, but they typically only apply to non-concrete function invocations. Could you further reduce the MWE, i.e., finding out which exact definitions are the culprit here?

How can I reduce the example more?

I tried to use only a subset of the packages:

using Flux: flatten
using MLDatasets: CIFAR10

But I think it only changes what’s made available in the scope, and it doesn’t change the result.

You can try devving Flux and MLDatasets, and disabling parts of the package (initially by just disabling includes from the main source file).

First post here (apologies for any beginner mistakes!)
I can confirm a similar error when running on Julia v.1.7.2:

using Flux
using MLDatasets
using CUDA
using Random

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a)

I am not yet sure why, but loading CUDA and Random and executing the command before loading Flux and MLDatasets seems to fix the issue:

using CUDA
using Random

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a)

using Flux
using MLDatasets

a = CuArray{Float32}(undef, 2)
Random.rand!(CUDA.default_rng(), a)

When I comment out these lines in Flux’ deprecations.jl then the error disappears:

function Broadcast.broadcasted(f::Recur, args...)
  # This had an explicit @adjoint rule, calling Zygote.∇map(__context__, f, args...), until v0.12
  Base.depwarn("""Broadcasting is not safe to use with RNNs, as it does not guarantee an iteration order.
    Re-writing this as a comprehension would be better.""", :broadcasted)
  map(f, args...)  # map isn't really safe either, but 
end

Note that this deprecation warning is not displayed when I run the test program.

In MLDatasets the problem seems to be the using DataFrames instruction. If I comment it out (and the rest of the file to avoid other errors) then the error also disappears.

So I dev’ed DataFrames and the error disappears when I comment out this instruction in broadcasting.jl:

Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::DataFrameStyle) =
    DataFrameStyle()

I don’t know how to move these instructions into the MWE since they depend on the packages types.

So in summary the error appears when all these conditions apply:

  • Flux is used and it defines Broadcast.broadcasted(f::Recur, args...)
  • DataFrames is used and it defines Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::DataFrameStyle)

You can try replicating the type definitions (or at least the bare minimum you need)? Either way, can you open an issue on CUDA.jl with your latest findings?

I created an issue with a smaller MWE:
https://github.com/JuliaGPU/CUDA.jl/issues/1508