Why is Flux/NNlib falling back to im2col instead of MIOpen on AMD GPU?

ajrohr2 · February 26, 2026, 3:17pm

Hello! I’ve been trying to get Flux’s convolutional layers, specifically the Flux.NNlib.conv! function, to use the MIOpen version rather than im2col or direct. I’ve verified that I have MIOpen available with AMDGPU.functional(:MIOpen), and querying the version info from AMDGPU shows all of the necessary libraries, aside from rocFFT, are available. Julia recognizes my GPU, an AMD Ryzen RX 7900 XT, as well.

I’ve included the code below. Has anyone else run into this issue?

using Flux, AMDGPU

ct = Conv((3,3), 4 => 32, pad=1, stride=1)
r_ct = roc(ct)
x = ROCArray(rand(Float32, 40, 40, 4, 1))

r_ct(x)

This always errors due to scalar indexing, because Flux is not calling the MIOpen compatible convolution function.

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
  [2] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
  [3] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
  [4] getindex
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/indexing.jl:50 [inlined]
  [5] scalar_getindex
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/indexing.jl:36 [inlined]
  [6] _getindex
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/indexing.jl:19 [inlined]
  [7] getindex
    @ ~/.julia/packages/GPUArrays/3a5jB/src/host/indexing.jl:17 [inlined]
  [8] getindex
    @ ./subarray.jl:316 [inlined]
  [9] im2col!(col::ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer}, x::SubArray{Float32, 4, ROCArray{…}, Tuple{…}, true}, cdims::DenseConvDims{3, 3, 3, 6, 3})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/impl/conv_im2col.jl:253
 [10] (::NNlib.var"#conv_part#538"{ROCArray{…}, Float32, Float32, SubArray{…}, SubArray{…}, ROCArray{…}, DenseConvDims{…}, Int64, Int64, Int64})(task_n::Int64, part::UnitRange{Int64})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/impl/conv_im2col.jl:53
 [11] conv_im2col!(y::SubArray{…}, x::SubArray{…}, w::ROCArray{…}, cdims::DenseConvDims{…}; col::ROCArray{…}, alpha::Float32, beta::Float32, ntasks::Int64)
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/impl/conv_im2col.jl:69
 [12] conv_im2col!(y::SubArray{…}, x::SubArray{…}, w::ROCArray{…}, cdims::DenseConvDims{…})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/impl/conv_im2col.jl:23
 [13] (::NNlib.var"#conv_group#186"{@Kwargs{}, ROCArray{…}, ROCArray{…}, ROCArray{…}, DenseConvDims{…}})(xc::UnitRange{Int64}, wc::UnitRange{Int64})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/conv.jl:209
 [14] conv!(out::ROCArray{…}, in1::ROCArray{…}, in2::ROCArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/conv.jl:218
 [15] conv!
    @ ~/.julia/packages/NNlib/srXYX/src/conv.jl:185 [inlined]
 [16] #conv!#143
    @ ~/.julia/packages/NNlib/srXYX/src/conv.jl:145 [inlined]
 [17] conv!
    @ ~/.julia/packages/NNlib/srXYX/src/conv.jl:140 [inlined]
 [18] conv(x::ROCArray{Float32, 4, AMDGPU.Runtime.Mem.HIPBuffer}, w::ROCArray{Float32, 4, AMDGPU.Runtime.Mem.HIPBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; kwargs::@Kwargs{})
    @ NNlib ~/.julia/packages/NNlib/srXYX/src/conv.jl:88
 [19] conv
    @ ~/.julia/packages/NNlib/srXYX/src/conv.jl:83 [inlined]
 [20] (::Conv{2, 4, typeof(identity), ROCArray{Float32, 4, AMDGPU.Runtime.Mem.HIPBuffer}, ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}})(x::ROCArray{Float32, 4, AMDGPU.Runtime.Mem.HIPBuffer})
    @ Flux ~/.julia/packages/Flux/DZYiO/src/layers/conv.jl:201
 [21] top-level scope
    @ REPL[30]:1

zhzy0077 · March 7, 2026, 5:27pm

I hit what looks like the same issue on AMDGPU/Flux/NNlib: Base.get_extension(NNlib, :NNlibAMDGPUExt) reported that the extension was loaded, but NNlib.conv on ROCArray still fell back to src/impl/conv_im2col.jl, causing the scalar-indexing error. In my case, julia --compiled-modules=no --project=. repro.jl made the same repro work immediately, which suggests the NNlibAMDGPUExt methods were being skipped during cached/precompiled loading even though AMDGPU.functional(:MIOpen) was true at runtime.

A practical workaround that fixed it for me was forcing a clean re-precompile of the environment:

 rm -rf ~/.julia/compiled/v1.12
 julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

After clearing the compiled cache and re-precompiling, plain julia --project=. repro.jl started using the MIOpen path correctly. So if someone sees NNlibAMDGPUExt loaded but still gets im2col, it may be worth wiping ~/.julia/compiled/... and rebuilding first.

Benny · March 9, 2026, 12:05am

Is there an issue open for that? Sounds like there should be.

The simplest place for this to go wrong is here, but I’m not sure how functional(:MIOpen) could be false at precompile time. NNlib.jl/ext/NNlibAMDGPUExt/NNlibAMDGPUExt.jl at 0c599d869216822060cd19d4f774006dcb60d2b6 · FluxML/NNlib.jl · GitHub

ajrohr2 · March 10, 2026, 1:30am

It looks like this worked! Thank you!

Topic		Replies	Views
Flux with AMD GPU(s)? Machine Learning flux , amdgpu	34	5462	February 15, 2023
Scalar indexing GPU problem in Flux.jl model GPU question , flux	4	453	May 8, 2024
Flux conv with GPU Arrays Machine Learning question , flux	5	216	November 12, 2024
Flux failing on GPU Machine Learning	25	4226	February 21, 2020
Can't get gpu recognized: AMDGPU.functional(:MIOpen) returns false GPU	3	497	December 2, 2023

Why is Flux/NNlib falling back to im2col instead of MIOpen on AMD GPU?

Related topics