Training on Simple Loss gives error: No method matching +(::Float64, ::Array{Float64, 2})

Hello everyone!

I’m getting a weird error with Flux/DiffEqFlux with a simple loss function when I try to train. I’m not sure why it’s giving me a ERROR: LoadError: MethodError: no method matching +(::Float64, ::Array{Float64,2} . Loss function works when I evaluate it myself.

Could anyone help me with suggestions for how to debug or better code it?

(It’ll be part of a slightly bigger SciML problem, which is why I’d like to use DiffEqFlux)

Simplified code and runnable example below to see error.

using DiffEqFlux
using Flux

U = FastChain(FastDense(1, 10, sigmoid), FastDense(10, 1))
function loss(θ)
    # Evaluate U neural network at y and return scalar
    nn_windspeed = y -> U(y, θ)[1]
    y = -5:0.1:5
    windspeeds = nn_windspeed.(y)
    truewinds = y.^2 #example
    return sum(abs2, (windspeeds .- truewinds))
end
loss(initial_params(U)) #works
DiffEqFlux.sciml_train(loss, initial_params(U), ADAM(1.0e-2), maxiters = 200) #error

Try y = Float32.(-5:0.1:5), I believe that that is the source of the problem-

Thank you. Unfortunately y = Float32.(-5:0.1:5) didn’t work. I still get the same error.

There seems to be a problem with nn_windspeed = y -> U(y, θ)[1] and windspeeds = nn_windspeed.(y). I suspect that because of how that line gets highlighted in the stacktrace [10] below.

ERROR: LoadError: MethodError: no method matching +(::Float64, ::Array{Float64,2})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:538
  +(::ChainRulesCore.DoesNotExist, ::Any) at ~/.julia/packages/ChainRulesCore/DBATp/src/differential_arithmetic.jl:23
  +(::ChainRulesCore.One, ::Any) at ~/.julia/packages/ChainRulesCore/DBATp/src/differential_arithmetic.jl:94
  ...
Stacktrace:
 [1] accum(::Float64, ::Array{Float64,2}) at ~/.julia/packages/Zygote/pM10l/src/lib/lib.jl:8
 [2] _broadcast_getindex_evalf at ./broadcast.jl:648 [inlined]
 [3] _broadcast_getindex at ./broadcast.jl:621 [inlined]
 [4] getindex at ./broadcast.jl:575 [inlined]
 [5] copy at ./broadcast.jl:876 [inlined]
 [6] materialize at ./broadcast.jl:837 [inlined]
 [7] accum(::Array{Float64,1}, ::Array{Array{Float64,2},1}) at ~/.julia/packages/Zygote/pM10l/src/lib/lib.jl:16
 [8] getindex at ./tuple.jl:24 [inlined]
 [9] gradindex at ~/.julia/packages/Zygote/pM10l/src/compiler/reverse.jl:12 [inlined]
 [10] loss at current_file_script.jl:9 [inlined]
 [11] (::typeof(∂(loss)))(::Float64) at ~/.julia/packages/Zygote/pM10l/src/compiler/interface2.jl:0
 [12] #69 at ~/.julia/packages/DiffEqFlux/lS4Sa/src/train.jl:3 [inlined]
 [13] #178 at ~/.julia/packages/Zygote/pM10l/src/lib/lib.jl:194 [inlined]
 [14] #1698#back at ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
 [15] OptimizationFunction at ~/.julia/packages/SciMLBase/fypD8/src/problems/basic_problems.jl:107 [inlined]
 [16] #178 at ~/.julia/packages/Zygote/pM10l/src/lib/lib.jl:194 [inlined]
 [17] #1698#back at ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
 [18] OptimizationFunction at ~/.julia/packages/SciMLBase/fypD8/src/problems/basic_problems.jl:107 [inlined]
 [19] #178 at ~/.julia/packages/Zygote/pM10l/src/lib/lib.jl:194 [inlined]
 [20] (::Zygote.var"#1698#back#180"{Zygote.var"#178#179"{typeof(∂(λ)),Tuple{Tuple{Nothing,Nothing},Int64}}})(::Float64) at ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
 [21] #8 at ~/.julia/packages/GalacticOptim/Ha1cY/src/solve.jl:94 [inlined]
 [22] (::typeof(∂(λ)))(::Float64) at ~/.julia/packages/Zygote/pM10l/src/compiler/interface2.jl:0
 [23] (::Zygote.var"#69#70"{Zygote.Params,Zygote.Context,typeof(∂(λ))})(::Float64) at ~/.julia/packages/Zygote/pM10l/src/compiler/interface.jl:252
 [24] gradient(::Function, ::Zygote.Params) at ~/.julia/packages/Zygote/pM10l/src/compiler/interface.jl:59
 [25] __solve(::SciMLBase.OptimizationProblem{false,SciMLBase.OptimizationFunction{false,GalacticOptim.AutoZygote,SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{typeof(loss)},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},GalacticOptim.var"#146#156"{GalacticOptim.var"#145#155"{SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{typeof(loss)},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Nothing}},GalacticOptim.var"#149#159"{GalacticOptim.var"#145#155"{SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{typeof(loss)},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Nothing}},GalacticOptim.var"#154#164",Nothing,Nothing,Nothing},Array{Float32,1},SciMLBase.NullParameters,Nothing,Nothing,Nothing,Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxiters,),Tuple{Int64}}}}, ::ADAM, ::Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}; maxiters::Int64, cb::Function, progress::Bool, save_best::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ~/.julia/packages/GalacticOptim/Ha1cY/src/solve.jl:93
 [26] #solve#450 at ~/.julia/packages/SciMLBase/fypD8/src/solve.jl:3 [inlined]
 [27] sciml_train(::typeof(loss), ::Array{Float32,1}, ::ADAM, ::GalacticOptim.AutoZygote; lower_bounds::Nothing, upper_bounds::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxiters,),Tuple{Int64}}}) at ~/.julia/packages/DiffEqFlux/lS4Sa/src/train.jl:6
 [28] top-level scope at current_file_script.jl:14
 [29] include(::String) at ./client.jl:457
 [30] top-level scope at REPL[20]:1
 [31] run_repl(::REPL.AbstractREPL, ::Any) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288
in expression starting at current_file_script.jl:14

y needs to be a row vector and you don’t have to broadcast it

function loss(θ)
    y = (-5:0.1:5)'
    windspeeds = U(y, θ)
    truewinds = y.^2 #example
    return sum(abs2, (windspeeds .- truewinds))
end
2 Likes

Thank you so much! This was the solution and didn’t have to do the weird broadcast and indexing. Definitely going to be much more careful with when the input should be a row vector or not and how its arranged.

Just fyi broadcasting might help with performance for very large batch sizes. For that you can do something like windspeeds = U.(eachcol(y), Ref(θ))

1 Like

Thank you so much for the tip! I didn’t know about eachcol and being able to use broadcast in that way.

Do you know why it gives a performance boost for large datasets?

I haven’t used the fast layers in DiffEqFlux much, but IIRC they are optimized to run fast if the size of the layers is < 100-200 - the same would hold for the batch of data you are feeding in as well. If your batches are in the 1000s then I’m not sure how much benefit you get from using these fast layers. But it’s possible I’m wrong about this, I just use Flux and for that it is definitely faster to pass the dataset as a matrix.

2 Likes

That is true.

1 Like