I have a Flux model that works OK on the CPU, but it fails when I switch to the GPU – see an MWE below. When I set gpu_or_cpu
to cpu
, loss drops to 0.004, but if I set it to gpu
, the loss increases.
I guess I am making a simple mistake. Any help is appreciated.
using CuArrays, Flux, ForwardDiff
using Flux: @epochs
CuArrays.culiteral_pow(::typeof(^), x::ForwardDiff.Dual{Nothing,Float32,1}, ::Val{2}) = x
N = 50 # n obs
gpu_or_cpu = gpu # do we run on cpu or gpu?
# generate some data (N obs)
x = Float32.( rand( 10, N)) |> gpu_or_cpu
y = Float32.( rand( 5, N)) |> gpu_or_cpu
data = [ ( x, y) ]
model = Chain( Dense( 10, 25, σ), Dense( 25, 5, sigmoid)) |> gpu_or_cpu
opt = ADAM( 0.01)
loss( x, y) = Flux.mse( model( x), y)
loss( x, y) # ~ 0.1 a priori
@epochs 10 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after 10 epochs: ~ 0.09 on cpu; ~ 0.17 on gpu
@epochs 1000 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after another 1000 epochs ~ 0.004 when cpu; ~0.32 when gpu