Package Entanglement: Why using one package breaks another?

Hi!
I stumped upon an unexpected problem with overall package/library management in julia, that I think is interesting to talk about, because I think it is quite general, and I haven’t found a similar topic.

My env:

(@v1.7) pkg> status Flux
      Status `~/.julia/environments/v1.7/Project.toml`
  [587475ba] Flux v0.13.4

(@v1.7) pkg> status DataFrames
      Status `~/.julia/environments/v1.7/Project.toml`
  [a93c6f00] DataFrames v1.3.4

(@v1.7) pkg> status CUDA
      Status `~/.julia/environments/v1.7/Project.toml`
  [052768ef] CUDA v3.12.0

The problem appears, when I’m using both Flux and DataFrames:

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> using CUDA

julia> using DataFrames

julia> m = Dropout(0.5) |> gpu
Dropout(0.5)

julia> trainmode!(m)
Dropout(0.5)

julia> m(CUDA.rand(10))
ERROR: InvalidIRError: compiling kernel rand!(CuDeviceVector{Float32, 1}, UInt32, UInt32) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to CUDA.Philox2x32{R}() where R in CUDA at /home/bodo/.julia/packages/CUDA/DfvRa/src/device/random.jl:46)
Stacktrace:
 [1] Philox2x32
   @ ~/.julia/packages/CUDA/DfvRa/src/device/random.jl:62
 [2] #default_rng
   @ ~/.julia/packages/CUDA/DfvRa/src/device/random.jl:95
 [3] kernel
   @ ~/.julia/packages/CUDA/DfvRa/src/random.jl:41
Reason: unsupported dynamic function invocation (call to rand(rng::Random.AbstractRNG, ::Type{X}) where X in Random at /opt/julia-1.7.1/share/julia/stdlib/v1.7/Random/src/Random.jl:257)
Stacktrace:
 [1] kernel
   @ ~/.julia/packages/CUDA/DfvRa/src/random.jl:53
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#320", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/N98un/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/N98un/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/jgSVI/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/N98un/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/N98un/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:354
  [7] #224
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{CUDA.var"#kernel#320", Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/N98un/src/driver.jl:76
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/N98un/src/cache.jl:90
 [11] cufunction(f::CUDA.var"#kernel#320", tt::Type{Tuple{CuDeviceVector{Float32, 1}, UInt32, UInt32}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:299
 [12] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:102 [inlined]
 [13] rand!(rng::CUDA.RNG, A::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/random.jl:62
 [14] _dropout_mask(rng::CUDA.RNG, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, p::Float64; dims::Function)
    @ Flux ~/.julia/packages/Flux/KkC79/src/layers/normalise.jl:45
 [15] #dropout_mask#318
    @ ~/.julia/packages/Flux/KkC79/src/layers/normalise.jl:39 [inlined]
 [16] dropout(rng::CUDA.RNG, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, p::Float64; dims::Function, active::Bool)
    @ Flux ~/.julia/packages/Flux/KkC79/src/layers/normalise.jl:34
 [17] WARNING: both Losses and NNlib export "ctc_loss"; uses of it in module Flux must be qualified
(::Dropout{Float64, Colon, CUDA.RNG})(x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Flux ~/.julia/packages/Flux/KkC79/src/layers/normalise.jl:111
 [18] top-level scope
    @ REPL[9]:1
 [19] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

The dropout layer suddenly stops working, but only on GPU, on CPU it works properly:

julia> m = cpu(m)
Dropout(0.5)

julia> trainmode!(m)
Dropout(0.5)

julia> m(rand(10))
10-element Vector{Float64}:
 0.0
 0.5422154108717265
 1.415133543027147
 0.0
 0.7556977370953
 1.022560028161302
 0.5363348842605291
 0.0
 0.9809514073811558
 0.3022707863331904

Without “using DataFrames” it works fine also on gpu:

julia> using Flux

julia> using CUDA

julia> m = Dropout(0.5) |> gpu
Dropout(0.5)

julia> trainmode!(m)
Dropout(0.5)

julia> m(CUDA.rand(10))
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 1.8166121
 0.28154758
 1.668092
 1.8140966
 0.020490058
 0.3029256
 0.0
 0.29084814
 0.0
 0.0

It was quite surprising, and I must say, I spent some time trying to solve it. The questions are:

  • Why does it happen? (I assume DataFrames is exporting maybe rand!, but why it suddenly breaks inner dependencies in Flux?)
  • How to solve it? (if I needed to use both packages in one session)
  • Is it a recognized issue?
  • Are there guidelines on how to prevent this when building your package?

This is Method definitions break native rand! kernel · Issue #1508 · JuliaGPU/CUDA.jl · GitHub. If someone could identify which compiler heuristics are being violated and how, we might have a shot at fixing it.

1 Like

On DataFrames.jl side I have re-checked that we correctly specify the broadcasting interface and I do not see an error there. If there is one - please let me know and I will fix it.