Multi-headed Network on GPU with Flux

I just started learning Flux.jl. I would like to setup a network that only takes part of the input data in the initial layer and then concatenates the output later. It looks like my code works fine on the CPU, however when I move it to the GPU i get an error message.

The following code contains a MWE:

using CuArrays
using Flux
model= Chain(
   x -> cat(Dense(25, 10)(x[1:25,:]), Dense(25,10)(x[26:50,:]), Dense(25,10)(x[51:75,:]); dims=1)
   ) |> gpu
data = gpu(randn(75,10))
model(data)

I get the error

ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32,2}
Stacktrace:
 [1] cconvert(::Type{Ptr{Float32}}, ::CuArray{Float32,2}) at C:\Users\Janina\.julia\packages\CuArrays\PwSdF\src\array.jl:152
 [2] gemm!(::Char, ::Char, ::Float32, ::Array{Float32,2}, ::CuArray{Float32,2}, ::Float32, ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\LinearAlgebra\src\blas.jl:1122
 [3] gemm_wrapper!(::CuArray{Float32,2}, ::Char, ::Char, ::Array{Float32,2}, ::CuArray{Float32,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\LinearAlgebra\src\matmul.jl:461
 [4] * at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\LinearAlgebra\src\matmul.jl:144 [inlined]
...

When running it on the CPU, it works fine. I guess the problem is the subarray indexing. Are there any recommended techniques to get around this problem? A Flux Chain doesn’t seem to support multiple inputs, so one cannot just split the data beforehand.