Flux.jl: training fails at GPU but works on CPU

I have a Flux model that works OK on the CPU, but it fails when I switch to the GPU – see an MWE below. When I set gpu_or_cpu to cpu, loss drops to 0.004, but if I set it to gpu, the loss increases.

I guess I am making a simple mistake. Any help is appreciated.

using CuArrays, Flux, ForwardDiff
using Flux: @epochs

CuArrays.culiteral_pow(::typeof(^), x::ForwardDiff.Dual{Nothing,Float32,1}, ::Val{2}) = x

N = 50 # n obs

gpu_or_cpu = gpu # do we run on cpu or gpu?

# generate some data (N obs)
x = Float32.( rand( 10, N)) |> gpu_or_cpu
y = Float32.( rand(  5, N)) |> gpu_or_cpu
data = [ ( x, y) ]

model = Chain( Dense( 10, 25, σ), Dense( 25, 5, sigmoid)) |> gpu_or_cpu
opt = ADAM( 0.01)
loss( x, y) = Flux.mse( model( x), y)

loss( x, y) # ~ 0.1 a priori
@epochs 10 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after 10 epochs:  ~ 0.09 on cpu;  ~ 0.17 on gpu
@epochs 1000 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after another 1000 epochs ~ 0.004 when cpu;  ~0.32 when gpu

hmmm, looks like somewhere in my file iterations the definition of CuArrays.culiteral_pow(::typeof(^), x::ForwardDiff.Dual{Nothing,Float32,1}, ::Val{2}) got accidentally changed… It should be x*x, and not x, of course :slight_smile: