Scalar parameter doesn't update in Flux, but array does

When I try to minimize a quadratic function f(x) in a single, scalar parameter x via gradient descent in Flux, there is no progression (no change in x), but the single-component array version of the same works correctly:

using Flux
data = Iterators.repeated([], 5)  # 5 iterations of empty data
opt = Descent()
let x = 3.0, f(x) = x^2
    println("No updates observed in scalar version:")
    Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end
let x = [3.0], f(x) = x[1]^2
    println("Updates seen as expected in 1-d array version:")
    Flux.train!(() -> f(x), params(x), data, opt, cb = () -> @show x, f(x))
end

Under Flux 0.10 (CPU, not GPU) in Julia 1.3 macos Mojave on a 2014 Macbook Pro 15, I get this output in a restarted IJulia notebook:

┌ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
└ @ CUDAdrv /Users/jonas/.julia/packages/CUDAdrv/3EzC1/src/CUDAdrv.jl:69

No updates observed in scalar version:
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
(x, f(x)) = (3.0, 9.0)
Updates seen as expected in 1-d array version:
(x, f(x)) = ([2.4], 5.76)
(x, f(x)) = ([1.92], 3.6864)
(x, f(x)) = ([1.536], 2.359296)
(x, f(x)) = ([1.2288000000000001], 1.5099494400000002)
(x, f(x)) = ([0.9830400000000001], 0.9663676416000002)

Shouldn’t both give essentially the same result (except for the single component array / scalar distinction in the output)?

1 Like

What does params(x) return for scalar x? I guess nothing.

Try wrapping scalar x in a Params struct before using it.

Yes, params(x) gives an empty list for scalar, very odd:

x = 3.0
@show params(x)
x_array = [3.0]
@show params(x_array)

displays

params(x) = Params([])
params(x_array) = Params([[3.0]])

Wrapping doesn’t seem to help, unfortunately:

x = 3.0
@show params(Params(x))

displays

params(Params(x)) = Params([])

You can always manually do f'(x) and call the optimizer yourself instead of using the implicit parameter interface.

Thanks for the suggestion. Unfortunately, taking f' and then using the optimizer via update! instead of train! has the same problem:

using Flux, Flux.Optimise
function mytrain(x, f)
    opt = Descent()
    for i in 1:5
        g = f'(x)
        Optimise.update!(opt, x, g)
        @show x, g, f(x) 
    end
end
println("Using array:")
mytrain([3.0], x -> x[1]^2)
println("Using scalar:")
mytrain(3.0, x -> x^2)

has error on the scalar case

Using array:
(x, g, f(x)) = ([2.4], [0.6000000000000001], 5.76)
(x, g, f(x)) = ([1.92], [0.48], 3.6864)
(x, g, f(x)) = ([1.536], [0.384], 2.359296)
(x, g, f(x)) = ([1.2288000000000001], [0.30720000000000003], 1.5099494400000002)
(x, g, f(x)) = ([0.9830400000000001], [0.24576000000000003], 0.9663676416000002)
Using scalar:

MethodError: no method matching copyto!(::Float64, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Tuple{},typeof(*),Tuple{Float64,Float64}})
Closest candidates are:
  copyto!(!Matched::Union{Base.ReshapedArray{#s27,#s26,AT,#s22} where #s22 where #s26 where #s27 where AT<:GPUArrays.GPUArray, SubArray{#s33,#s32,AT,I,L} where L where I where #s32 where #s33 where AT<:GPUArrays.GPUArray, GPUArrays.GPUArray, LinearAlgebra.Adjoint{#s21,AT} where #s21 where AT<:GPUArrays.GPUArray, LinearAlgebra.Diagonal{#s12,AT} where #s12 where AT<:GPUArrays.GPUArray, LinearAlgebra.LowerTriangular{#s19,AT} where #s19 where AT<:GPUArrays.GPUArray, LinearAlgebra.Transpose{#s20,AT} where #s20 where AT<:GPUArrays.GPUArray, LinearAlgebra.Tridiagonal{#s79,AT} where #s79 where AT<:GPUArrays.GPUArray, LinearAlgebra.UnitLowerTriangular{#s18,AT} where #s18 where AT<:GPUArrays.GPUArray, LinearAlgebra.UnitUpperTriangular{#s13,AT} where #s13 where AT<:GPUArrays.GPUArray, LinearAlgebra.UpperTriangular{#s17,AT} where #s17 where AT<:GPUArrays.GPUArray, PermutedDimsArray{#s31,#s30,#s29,#s28,AT} where #s28 where #s29 where #s30 where #s31 where AT<:GPUArrays.GPUArray}, ::Base.Broadcast.Broadcasted{#s79,Axes,F,Args} where Args<:Tuple where F where Axes where #s79<:Base.Broadcast.AbstractArrayStyle{0}) at /Users/jonas/.julia/packages/GPUArrays/1wgPO/src/broadcast.jl:60
  copyto!(!Matched::AbstractArray, ::Base.Broadcast.Broadcasted{#s627,Axes,F,Args} where Args<:Tuple where F where Axes where #s627<:Base.Broadcast.AbstractArrayStyle{0}) at broadcast.jl:869
  copyto!(!Matched::AbstractArray, ::Base.Broadcast.Broadcasted) at broadcast.jl:863
  ...

Stacktrace:
 [1] materialize! at ./broadcast.jl:822 [inlined]
 [2] apply!(::Descent, ::Float64, ::Float64) at /Users/jonas/.julia/packages/Flux/oX9Pi/src/optimise/optimisers.jl:40
 [3] update!(::Descent, ::Float64, ::Float64) at /Users/jonas/.julia/packages/Flux/oX9Pi/src/optimise/train.jl:10
 [4] mytrain(::Float64, ::var"#38#39") at ./In[18]:6
 [5] top-level scope at In[18]:13

I suspect that the scalar case is failing because a Float64 is unboxed / immutable (a value) while the array is boxed / mutable (a reference).

Did you ever come up with a solution for this? I am facing this problem right now.

I guess the hacky way is to just wrap the scalar as [x] and do this everywhere

This is what I have been doing, but my main issue is that the loss function needs to either index into the array or handle broadcasting properly, making this solution even more cumbersome.

1 Like

maybe @. macro would make life easier?

1 Like