When I try to minimize a quadratic function f(x)
in a single, scalar parameter x
via gradient descent in Flux, there is no progression (no change in x
), but the single-component array version of the same works correctly:
using Flux
data = Iterators.repeated([], 5) # 5 iterations of empty data
opt = Descent()
let x = 3.0, f(x) = x^2
println("No updates observed in scalar version:")
Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end
let x = [3.0], f(x) = x[1]^2
println("Updates seen as expected in 1-d array version:")
Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end
Under Flux 0.10 (CPU, not GPU) in Julia 1.3 macos Mojave on a 2014 Macbook Pro 15, I get this output in a restarted IJulia notebook:
┌ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
└ @ CUDAdrv /Users/jonas/.julia/packages/CUDAdrv/3EzC1/src/CUDAdrv.jl:69
No updates observed in scalar version:
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
Updates seen as expected in 1-d array version:
(x, f(x)) = ([2.4], 5.76)
(x, f(x)) = ([1.92], 3.6864)
(x, f(x)) = ([1.536], 2.359296)
(x, f(x)) = ([1.2288000000000001], 1.5099494400000002)
(x, f(x)) = ([0.9830400000000001], 0.9663676416000002)
Shouldn’t both give essentially the same result (except for the single component array / scalar distinction in the output)?