Flux.jl: training fails at GPU but works on CPU

Martijn-R · September 17, 2019, 10:15pm

I have a Flux model that works OK on the CPU, but it fails when I switch to the GPU – see an MWE below. When I set gpu_or_cpu to cpu, loss drops to 0.004, but if I set it to gpu, the loss increases.

I guess I am making a simple mistake. Any help is appreciated.

using CuArrays, Flux, ForwardDiff
using Flux: @epochs

CuArrays.culiteral_pow(::typeof(^), x::ForwardDiff.Dual{Nothing,Float32,1}, ::Val{2}) = x

N = 50 # n obs

gpu_or_cpu = gpu # do we run on cpu or gpu?

# generate some data (N obs)
x = Float32.( rand( 10, N)) |> gpu_or_cpu
y = Float32.( rand(  5, N)) |> gpu_or_cpu
data = [ ( x, y) ]

model = Chain( Dense( 10, 25, σ), Dense( 25, 5, sigmoid)) |> gpu_or_cpu
opt = ADAM( 0.01)
loss( x, y) = Flux.mse( model( x), y)

loss( x, y) # ~ 0.1 a priori
@epochs 10 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after 10 epochs:  ~ 0.09 on cpu;  ~ 0.17 on gpu
@epochs 1000 Flux.train!( loss, params( model), data, opt)
loss( x, y) # after another 1000 epochs ~ 0.004 when cpu;  ~0.32 when gpu

Martijn-R · September 19, 2019, 1:06am

hmmm, looks like somewhere in my file iterations the definition of CuArrays.culiteral_pow(::typeof(^), x::ForwardDiff.Dual{Nothing,Float32,1}, ::Val{2}) got accidentally changed… It should be x*x, and not x, of course

Topic		Replies	Views
Flux training GPU vs CPU different results Machine Learning gpu , flux	7	1173	August 20, 2020
Training with Flux.jl on the GPU causes ArgumentError: cannot take the CPU address of a CuArray GPU question , gpu , flux , machine-learning , neural-network	4	1100	May 28, 2022
MNIST GPU CuArrays error GPU	23	3067	January 22, 2019
Flux on GPU too slow Machine Learning gpu , cuda , flux	5	1119	September 22, 2022
Data Science lessons: Making "10 - Neural Networks" run on GPU? New to Julia gpu , flux	4	740	January 14, 2022

Flux.jl: training fails at GPU but works on CPU

Related topics