Problem relating to use gradient descends two models using GPU

Hi. I am using GPU and Flux to train 2 models. Example code:

using CUDA
using Flux
using Random
Random.seed!(333)
model1 = Chain(
    Dense(3, 3) |> gpu
)

model2 = Chain(
    model1,
    Dense(ones(3,3),true,relu) |> gpu
)
model3 = Chain(
    model1,
    Dense(ones(3,3),true,relu) |> gpu
)
A = [1,2,3]
B = [1,1,1]
label = [10,10,10]
trainLoader1 = Flux.DataLoader((A, label), batchsize=64, shuffle=true) |> gpu
trainLoader2 = Flux.DataLoader((B, label), batchsize=64, shuffle=true) |> gpu
opt = Adam(0.1)
opt_stats = Flux.setup(opt, (model2, model3))
for i in 1:5
    global l1,l2
    gs = gradient(model2, model3) do m2, m3
        for (x, y) in trainLoader1
            l1 = Flux.mse(m2(x),y)
        end
        for (x, y) in trainLoader2
            l2 = Flux.mse(m3(x),y)
        end
        allLoss = l1 + l2
        @show l1
        @show l2
        @show allLoss
    end
    Flux.update!(opt_stats, (model2,model3), gs)
end

But encountered error:

ERROR: LoadError: GPU compilation of MethodInstance for (::GPUArrays.var"#34#36")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{Vector{Int64}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument

It seems I cannot wrap two models in a tuple while using GPU?

I would suggest first of all to create a script that will work on cpu, i.e. removing the gpu calls. Your script probably won’t work. After you make sure everything works on cpu, you can test on gpu. Also, in general you should report the entire stacktrace, not just the final error.

Sorry, My fault.
I mistakenly used trianloader…though I still didn’t know how to use it now.
Anyway, it will run smoothly without using trainloader:

using CUDA
using Flux
using Random
Random.seed!(333)
model1 = Chain(
    Dense(3, 3) |> gpu
)

model2 = Chain(
    model1,
    Dense(ones(3,3),true,relu) |> gpu
)
model3 = Chain(
    model1,
    Dense(ones(3,3),true,relu) |> gpu
)
A = [1.0,2.0,3.0] |> gpu
B = [1.0,1.0,1.0] |> gpu
label = [10,10,10] |> gpu
opt = Adam(0.1)
opt_stats = Flux.setup(opt, (model2, model3))
for i in 1:5
    global l1 = 0
    global l2 = 0
    gs = gradient(model2, model3) do m2, m3
        l1 += Flux.mse(m2(A),label)
        l2 = Flux.mse(m3(B),label)
        allLoss = l1 + l2
        @show l1
        @show l2
        @show allLoss
    end
    Flux.update!(opt_stats, (model2,model3), gs)
end