Flux conv with GPU Arrays

trasor · November 11, 2024, 7:51pm

Hello everyone,

i managed to build a UNet with Flux. On the cpu, everything works. But when i transfer the model on the gpu (using Metal.jl), the convoltional layers throw me an error.

      using Flux
      using Metal
      
      m = Dense(10, 5) |> gpu
      x = rand(10) |> gpu
      m(x) #this works fine
      

      m = Conv((3,3), 1 => 64, pad=(2,2)) |> gpu
      x = Float32.(rand(128, 128, 1, 1)) |> gpu
      m(x) # but this gives me an error:

ERROR: TaskFailedException

nested task error: TaskFailedException
nested task error: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations do not execute on the GPU, but very slowly on the
CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use allowscalar or @allowscalar
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:50 [inlined]
[6] scalar_getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:36 [inlined]
[7] _getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:19 [inlined]
[8] getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:17 [inlined]
[9] getindex
@ ./subarray.jl:320 [inlined]
[10] im2col!(col::MtlMatrix{…}, x::SubArray{…}, cdims::DenseConvDims{…})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:238
[11] (::NNlib.var"#612#613"{…})()
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:54
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:466
[2] macro expansion
@ ./task.jl:499 [inlined]
[3] conv_im2col!(y::SubArray{…}, x::SubArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; col::MtlArray{…}, alpha::Float32, beta::Float32, ntasks::Int64)
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:50
[4] conv_im2col!
@ ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:23 [inlined]
[5] (::NNlib.var"#284#288"{@Kwargs{}, DenseConvDims{…}, SubArray{…}, MtlArray{…}, SubArray{…}})()
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:209
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:466
[2] macro expansion
@ ./task.jl:499 [inlined]
[3] conv!(out::MtlArray{…}, in1::MtlArray{…}, in2::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:205
[4] conv!
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:185 [inlined]
[5] conv!(y::MtlArray{…}, x::MtlArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:145
[6] conv!
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:140 [inlined]
[7] conv(x::MtlArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:88
[8] conv
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:83 [inlined]
[9] (::Conv{2, 2, typeof(identity), MtlArray{…}, MtlVector{…}})(x::MtlArray{Float32, 4, Metal.PrivateStorage})
@ Flux ~/.julia/packages/Flux/vwk6M/src/layers/conv.jl:202
[10] top-level scope
@ ~/Nextcloud/ml_jl/demos/unet_segmentation/unet.jl:117
Some type information was truncated. Use show(err) to see complete types.

mcabbott · November 11, 2024, 8:45pm

Metal support is very limited, and it’s likely that nobody has hooked up the required kernels here.

(On any recent mac, using AppleAccelerate.jl is highly recommended.)

gdalle · November 11, 2024, 8:57pm

Just to clarify, this only speeds up CPU operations using BLAS/LAPACK, not GPU ones, right?

viralbshah · November 12, 2024, 1:26am

That’s right.

trasor · November 12, 2024, 3:10pm

Yop, same code above works with CUDA jl, so seems like the metal kernel dont exist yet.

ToucheSir · November 12, 2024, 3:35pm

Conv support for Metal 100% does not exist right now, yes. PRs that wrap the Metal Performance Shaders conv routines (Missing functionalities - Metal with Conv and ConvTranspose layers · Issue #2278 · FluxML/Flux.jl · GitHub), add better errors for unsupported operations on Metal (Better errors for un-implemented functions · Issue #427 · FluxML/NNlib.jl · GitHub) or add a fallback KernelAbstractions.jl kernel for GPU conv (no tracking issue yet) would be most welcome. Until then, the status quo persists.

Topic		Replies	Views
Help with Flux.jl, Metal.jl (Apple Silicon) and Conv layers Machine Learning	2	994	February 23, 2024
Scalar indexing GPU problem in Flux.jl model GPU question , flux	4	369	May 8, 2024
Flux CNN error: Scalar indexing is disallowed Machine Learning question , flux , machine-learning	4	1246	May 11, 2024
Flux: Scalar getindex error Machine Learning	13	2063	May 15, 2020
GPU version of my net got stuck New to Julia	0	245	May 16, 2021

Flux conv with GPU Arrays

Related topics