Scalar parameter doesn't update in Flux, but array does

jonasaugust · December 21, 2019, 11:26am

When I try to minimize a quadratic function f(x) in a single, scalar parameter x via gradient descent in Flux, there is no progression (no change in x), but the single-component array version of the same works correctly:

using Flux
data = Iterators.repeated([], 5)  # 5 iterations of empty data
opt = Descent()
let x = 3.0, f(x) = x^2
    println("No updates observed in scalar version:")
    Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end
let x = [3.0], f(x) = x[1]^2
    println("Updates seen as expected in 1-d array version:")
    Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end

Under Flux 0.10 (CPU, not GPU) in Julia 1.3 macos Mojave on a 2014 Macbook Pro 15, I get this output in a restarted IJulia notebook:

┌ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
└ @ CUDAdrv /Users/jonas/.julia/packages/CUDAdrv/3EzC1/src/CUDAdrv.jl:69

No updates observed in scalar version:
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
Updates seen as expected in 1-d array version:
(x, f(x)) = ([2.4], 5.76)
(x, f(x)) = ([1.92], 3.6864)
(x, f(x)) = ([1.536], 2.359296)
(x, f(x)) = ([1.2288000000000001], 1.5099494400000002)
(x, f(x)) = ([0.9830400000000001], 0.9663676416000002)

Shouldn’t both give essentially the same result (except for the single component array / scalar distinction in the output)?

baggepinnen · December 21, 2019, 12:28pm

What does params(x) return for scalar x? I guess nothing.

baggepinnen · December 21, 2019, 12:29pm

Try wrapping scalar x in a Params struct before using it.

jonasaugust · December 21, 2019, 1:29pm

Yes, params(x) gives an empty list for scalar, very odd:

x = 3.0
@show params(x)
x_array = [3.0]
@show params(x_array)

displays

params(x) = Params([])
params(x_array) = Params([[3.0]])

jonasaugust · December 21, 2019, 1:32pm

Wrapping doesn’t seem to help, unfortunately:

x = 3.0
@show params(Params(x))

displays

params(Params(x)) = Params([])

baggepinnen · December 21, 2019, 2:39pm

You can always manually do f'(x) and call the optimizer yourself instead of using the implicit parameter interface.

jonasaugust · December 21, 2019, 4:31pm

Thanks for the suggestion. Unfortunately, taking f' and then using the optimizer via update! instead of train! has the same problem:

using Flux, Flux.Optimise
function mytrain(x, f)
    opt = Descent()
    for i in 1:5
        g = f'(x)
        Optimise.update!(opt, x, g)
        @show x, g, f(x) 
    end
end
println("Using array:")
mytrain([3.0], x -> x[1]^2)
println("Using scalar:")
mytrain(3.0, x -> x^2)

has error on the scalar case

Using array:
(x, g, f(x)) = ([2.4], [0.6000000000000001], 5.76)
(x, g, f(x)) = ([1.92], [0.48], 3.6864)
(x, g, f(x)) = ([1.536], [0.384], 2.359296)
(x, g, f(x)) = ([1.2288000000000001], [0.30720000000000003], 1.5099494400000002)
(x, g, f(x)) = ([0.9830400000000001], [0.24576000000000003], 0.9663676416000002)
Using scalar:

MethodError: no method matching copyto!(::Float64, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Tuple{},typeof(*),Tuple{Float64,Float64}})
Closest candidates are:
  copyto!(!Matched::Union{Base.ReshapedArray{#s27,#s26,AT,#s22} where #s22 where #s26 where #s27 where AT<:GPUArrays.GPUArray, SubArray{#s33,#s32,AT,I,L} where L where I where #s32 where #s33 where AT<:GPUArrays.GPUArray, GPUArrays.GPUArray, LinearAlgebra.Adjoint{#s21,AT} where #s21 where AT<:GPUArrays.GPUArray, LinearAlgebra.Diagonal{#s12,AT} where #s12 where AT<:GPUArrays.GPUArray, LinearAlgebra.LowerTriangular{#s19,AT} where #s19 where AT<:GPUArrays.GPUArray, LinearAlgebra.Transpose{#s20,AT} where #s20 where AT<:GPUArrays.GPUArray, LinearAlgebra.Tridiagonal{#s79,AT} where #s79 where AT<:GPUArrays.GPUArray, LinearAlgebra.UnitLowerTriangular{#s18,AT} where #s18 where AT<:GPUArrays.GPUArray, LinearAlgebra.UnitUpperTriangular{#s13,AT} where #s13 where AT<:GPUArrays.GPUArray, LinearAlgebra.UpperTriangular{#s17,AT} where #s17 where AT<:GPUArrays.GPUArray, PermutedDimsArray{#s31,#s30,#s29,#s28,AT} where #s28 where #s29 where #s30 where #s31 where AT<:GPUArrays.GPUArray}, ::Base.Broadcast.Broadcasted{#s79,Axes,F,Args} where Args<:Tuple where F where Axes where #s79<:Base.Broadcast.AbstractArrayStyle{0}) at /Users/jonas/.julia/packages/GPUArrays/1wgPO/src/broadcast.jl:60
  copyto!(!Matched::AbstractArray, ::Base.Broadcast.Broadcasted{#s627,Axes,F,Args} where Args<:Tuple where F where Axes where #s627<:Base.Broadcast.AbstractArrayStyle{0}) at broadcast.jl:869
  copyto!(!Matched::AbstractArray, ::Base.Broadcast.Broadcasted) at broadcast.jl:863
  ...

Stacktrace:
 [1] materialize! at ./broadcast.jl:822 [inlined]
 [2] apply!(::Descent, ::Float64, ::Float64) at /Users/jonas/.julia/packages/Flux/oX9Pi/src/optimise/optimisers.jl:40
 [3] update!(::Descent, ::Float64, ::Float64) at /Users/jonas/.julia/packages/Flux/oX9Pi/src/optimise/train.jl:10
 [4] mytrain(::Float64, ::var"#38#39") at ./In[18]:6
 [5] top-level scope at In[18]:13

I suspect that the scalar case is failing because a Float64 is unboxed / immutable (a value) while the array is boxed / mutable (a reference).

DrPapa · February 5, 2020, 3:48pm

Did you ever come up with a solution for this? I am facing this problem right now.

jling · February 5, 2020, 4:25pm

I guess the hacky way is to just wrap the scalar as [x] and do this everywhere

DrPapa · February 5, 2020, 6:20pm

This is what I have been doing, but my main issue is that the loss function needs to either index into the array or handle broadcasting properly, making this solution even more cumbersome.

jling · February 5, 2020, 10:06pm

maybe @. macro would make life easier?

Topic		Replies	Views
How can I go about updating scalar parameters in Flux.jl? New to Julia question , differentiation , flux	3	1511	January 10, 2019
Toy Flux example with one paramemeter not working? Machine Learning flux	2	800	November 4, 2019
What's wrong with this Flux model definitin? Machine Learning first-steps , flux	1	759	November 7, 2019
Params not getting updated during training New to Julia flux	25	1734	October 11, 2020
Is it possible to selectively update the params in Flux.jl? Machine Learning flux	9	924	April 17, 2019

Scalar parameter doesn't update in Flux, but array does

Related topics