Flux conv with GPU Arrays

Hello everyone,

i managed to build a UNet with Flux. On the cpu, everything works. But when i transfer the model on the gpu (using Metal.jl), the convoltional layers throw me an error.

      using Flux
      using Metal
      
      m = Dense(10, 5) |> gpu
      x = rand(10) |> gpu
      m(x) #this works fine
      

      m = Conv((3,3), 1 => 64, pad=(2,2)) |> gpu
      x = Float32.(rand(128, 128, 1, 1)) |> gpu
      m(x) # but this gives me an error:

ERROR: TaskFailedException

nested task error: TaskFailedException
nested task error: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations do not execute on the GPU, but very slowly on the
CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use allowscalar or @allowscalar
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:50 [inlined]
[6] scalar_getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:36 [inlined]
[7] _getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:19 [inlined]
[8] getindex
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/indexing.jl:17 [inlined]
[9] getindex
@ ./subarray.jl:320 [inlined]
[10] im2col!(col::MtlMatrix{…}, x::SubArray{…}, cdims::DenseConvDims{…})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:238
[11] (::NNlib.var"#612#613"{…})()
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:54
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:466
[2] macro expansion
@ ./task.jl:499 [inlined]
[3] conv_im2col!(y::SubArray{…}, x::SubArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; col::MtlArray{…}, alpha::Float32, beta::Float32, ntasks::Int64)
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:50
[4] conv_im2col!
@ ~/.julia/packages/NNlib/CkJqS/src/impl/conv_im2col.jl:23 [inlined]
[5] (::NNlib.var"#284#288"{@Kwargs{}, DenseConvDims{…}, SubArray{…}, MtlArray{…}, SubArray{…}})()
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:209
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:466
[2] macro expansion
@ ./task.jl:499 [inlined]
[3] conv!(out::MtlArray{…}, in1::MtlArray{…}, in2::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:205
[4] conv!
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:185 [inlined]
[5] conv!(y::MtlArray{…}, x::MtlArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:145
[6] conv!
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:140 [inlined]
[7] conv(x::MtlArray{…}, w::MtlArray{…}, cdims::DenseConvDims{…}; kwargs::@Kwargs{})
@ NNlib ~/.julia/packages/NNlib/CkJqS/src/conv.jl:88
[8] conv
@ ~/.julia/packages/NNlib/CkJqS/src/conv.jl:83 [inlined]
[9] (::Conv{2, 2, typeof(identity), MtlArray{…}, MtlVector{…}})(x::MtlArray{Float32, 4, Metal.PrivateStorage})
@ Flux ~/.julia/packages/Flux/vwk6M/src/layers/conv.jl:202
[10] top-level scope
@ ~/Nextcloud/ml_jl/demos/unet_segmentation/unet.jl:117
Some type information was truncated. Use show(err) to see complete types.

Metal support is very limited, and it’s likely that nobody has hooked up the required kernels here.

(On any recent mac, using AppleAccelerate.jl is highly recommended.)

1 Like

Just to clarify, this only speeds up CPU operations using BLAS/LAPACK, not GPU ones, right?

1 Like

That’s right.

2 Likes

Yop, same code above works with CUDA jl, so seems like the metal kernel dont exist yet.

Conv support for Metal 100% does not exist right now, yes. PRs that wrap the Metal Performance Shaders conv routines (Missing functionalities - Metal with Conv and ConvTranspose layers · Issue #2278 · FluxML/Flux.jl · GitHub), add better errors for unsupported operations on Metal (Better errors for un-implemented functions · Issue #427 · FluxML/NNlib.jl · GitHub) or add a fallback KernelAbstractions.jl kernel for GPU conv (no tracking issue yet) would be most welcome. Until then, the status quo persists.