Nonlinear Least Squares Millions of Allocations

Overview

An algorithm I am working on requires solving many largish nonlinear least squares problems. The number of allocations is exploding. My nonlinear function are efficient when passed Arrays of Float64 and generate few allocations, but when they are passed as an operator to Ipopt and are evaluated with VariableRef or AffExp then the evaluation time and number of allocations exploded.

I can provide an analytic gradient which is also efficient to compute with Float64’s, so automatic differentiation should not be necessary. But I do not see my gradient function being evaluated. I think this is part of why the code is so slow.

I am looking for advice on how to move forward to make solving non-linear least squares problems in Julia as fast as possible.

I should mention I am using Julia v1.10.4 on an intel Mac.

Background

You can probably skip this if you don’t care about the motivation and just want to see the coding problem.

I am trying to solve this problem many times because it arises from an iterative re-weighted non-linear least squares problem:
\min_w \tfrac{1}{2} \big\|(R^T)^{-1}(G(w;u)-b)\big\|_2^2.
This comes from trying to approximate the parameters of a differential equation of the form
\dot{u(t)} = F(u(t),w)
where u(t) are a dependent variable(s) (possibly a vector), and w are parameters. I am extending the work WENDy Algorithm to handle ODE’s (and then PDE’s) that are nonlinear in parameters (ie the rhs F is nonlinear in w).

Code and Log

I am following the example shown in the Nonlinear Least Squares Feature Comparison, but something is not working quite right.

I built relatively efficient ways to evaluate r(w) := (R^T)^{-1} (G(w;u) - b) and \nabla_u r(w). When I benchmark these function when I use w::Vector{Float64} then they are very ten’s of allocations, but when I register these functions to an operator for a Model then they have thousands of allocations (see the log output).

I am attaching code with random data to isolate the issue, but in practice the U and V and sig variables are all computed upstream. The f!, jacuf!, and jacwf! are from the Hindmarsh-Rose Equation which has a rhs F that is linear in w, but I am using this as a way to check that the NLS solver is working. These functions are all generated by ModelingToolkit.jl, but I wrote them here in a more human readable format. Here is the code:

@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt,  Tullio, BenchmarkTools
## 
# Set size of problem 
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10 
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise 
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
## Define necessary functions 
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
    fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3] 
    fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2] 
    fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
    jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
    jfm[2] = 2*w[6]*um[1]
    jfm[3] = w[8]
    jfm[4] = w[1]
    jfm[5] = w[7]
    jfm[6] = 0
    jfm[7] = w[4]
    jfm[8] = 0
    jfm[9] = w[10]
    nothing 
end 
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
    jfm .= 0
    jfm[1] = um[2]
    jfm[4] = (^)(um[1], 3)
    jfm[7] = (^)(um[1], 2)
    jfm[10] = um[3]
    jfm[14] = 1
    jfm[17] = (^)(um[1], 2)
    jfm[20] = um[2]
    jfm[24] = um[1]
    jfm[27] = 1
    jfm[30] = um[3]
    nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
    # Preallocate for L 
    D, M = size(U)
    K, _ = size(V)
    JuF = zeros(D,D,M)
    _L0 = zeros(K,D,D,M)
    _L1  = zeros(K,D,M,D)
    # Get _L1
    K,D,M,_ = size(_L1)
    @inbounds for m in 1:size(JuF,3)
        jacuf!(view(JuF,:,:,m), w, view(U,:,m))
    end
    @tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
    @tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
    permutedims!(_L1,_L0,(1,2,4,3))
    L = reshape(_L1,K*D,M*D)
    # compute covariance
    S = L * L'
    # regularize for possible ill conditioning
    R = (1-diag_reg)*S + diag_reg*I
    # compute cholesky for lin solv efficiency
    cholesky!(Symmetric(R))
    return UpperTriangular(R)'
end
# Define residual function 
function _res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, ::Val{T}, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where {W,T}
    with_logger(ConsoleLogger(stderr, ll)) do 
        K, M = size(V)
        D, _ = size(U)
        @info "++++ Res Eval ++++"
        @info " Evaluate F "
        dt = @elapsed a = @allocations begin 
            F = zeros(T, D, M)
            for m in 1:size(F,2)
                f!(view(F,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Mat Mat mult "
        dt = @elapsed a = @allocations G = V * F'
        @info "  $dt s, $a allocations"
        @info " Reshape "
        dt = @elapsed a = @allocations g = reshape(G, K*D)
        @info "  $dt s, $a allocations"
        @info " Linear Solve "
        dt = @elapsed a = @allocations res = RT \ g
        @info "  $dt s, $a allocations"
        @info " Vec Vec add "
        dt = @elapsed a = @allocations res .-= b
        @info "  $dt s, $a allocations"
        @info "++++++++++++++++++"
        return res
    end
end
# Define Jacobian of residual with respect to the parameters w 
function _jacRes(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, ::Val{T}, w::AbstractVector{W}; showFlag::Bool=false, ll::Logging.LogLevel=Logging.Warn) where {W,T}
    with_logger(ConsoleLogger(stderr, ll)) do 
        showFlag && println(stderr, "Hello")
        K, M = size(V)
        D, _ = size(U)
        J = length(w)
        JwF = zeros(T,D,J,M)
        for m in 1:M
            jacwf!(view(JwF,:,:,m), w, view(U,:,m))
        end
        @tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m] 
        JG = permutedims(_JG,(3,1,2))
        jacG = reshape(JG, K*D, J)
        return RT \ jacG
    end
end
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0 
res_float64(w, ll) = _res(RT,U,V,b,f!,Val(Float64), w; ll=ll)
res_float64(w_rand, Logging.Info)
@btime res_float64($w_rand, $Logging.Warn);
nothing
## Compute the non lin least square solution 
@info "Defining IRWLS_Nonlinear..."
function IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Info,maxIt=100, relTol=1e-4)
    with_logger(ConsoleLogger(stderr,ll)) do 
        @info "Initializing the linearization least squares solution  ..."
        D, M = size(U)
        K, _ = size(V)
        G0 = _jacRes(Matrix{Float64}(I, K*D,K*D), U, V, jacwf!, Val(Float64), zeros(J))
        w0 = G0 \ b0 
        wit = zeros(J,maxIt)
        resit = zeros(J,maxIt)
        wnm1 = w0 
        wn = similar(w0)
        for n = 1:maxIt 
            RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,wnm1)
            b = RT \ b0 
            ll = (n == 1) ? Logging.Info : Logging.Warn 
            res_AffExpr(w::AbstractVector) = _res(RT,U,V,b,f!,Val(AffExpr), w; ll=ll)
            jacRes_AffExpr(w::AbstractVector) = _jacRes(RT,U,V,jacwf!,Val(AffExpr),w;showFlag=true)
            w_star = _jacRes(RT, U, V, jacwf!, Val(Float64), zeros(J)) \ b 
            ##
            @info "Defining model for Ipopt"
            mdl = Model(Ipopt.Optimizer)
            @variable(mdl, w[j = 1:J], start = wnm1[j])
            @variable(mdl, r[k = 1:K*D])
            @operator(mdl, f, J, res_AffExpr, jacRes_AffExpr)
            @constraint(mdl, r == f(w))
            @objective(mdl, Min, sum(r.^2 ) ) 
            set_silent(mdl)
            @info "Running optimizer"
            dtn = @elapsed optimize!(mdl)
            wn = value.(w)
            resn = wn - w_star
            relErr = norm(resn) / norm(w_star)
            resit[:,n] = resn
            wit[:,n] = wn
            @info """ Iteration $n
                solve time         = $dtn s
                relative Error     = $relErr
                barrier_iterations = $(barrier_iterations(mdl))
                termination_status = $(termination_status(mdl))
                primal_status      = $(primal_status(mdl))
                objective_value    = $(objective_value(mdl))
            """
            if norm(wnm1 - wn) / norm(wnm1) < relTol
                resit = resit[:,1:n] 
                wit = wit[:,1:n] 
                @info "Convergence Criterion met!"
                return wn, wit, resit 
            end
            wnm1 = wn
        end
        @warn "Maxiteration met..."
        return wn, wit, resit 
    end
end 
##
@info "IRWLS (Nonlinear): "
@info "   Runtime info: "
@time what, wit, resit = IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Warn)
@info "   iterations    = $(size(wit,2)-1)"

Here is the log:

[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info: ++++ Res Eval ++++
[ Info:  Evaluate F 
[ Info:   1.8738e-5 s, 2 allocations
[ Info:  Mat Mat mult 
[ Info:   0.00034676 s, 1 allocations
[ Info:  Reshape 
[ Info:   1.034e-6 s, 2 allocations
[ Info:  Linear Solve 
[ Info:   6.248e-5 s, 1 allocations
[ Info:  Vec Vec add 
[ Info:   1.201e-6 s, 0 allocations
[ Info: ++++++++++++++++++
  223.049 μs (13 allocations: 29.22 KiB)
[ Info: Defining IRWLS_Nonlinear...
[ Info: IRWLS (Nonlinear): 
[ Info:    Runtime info: 
[ Info: ++++ Res Eval ++++
[ Info:  Evaluate F 
[ Info:   0.006429002 s, 114007 allocations
[ Info:  Mat Mat mult 
[ Info:   0.126229552 s, 2401 allocations
[ Info:  Reshape 
[ Info:   9.96e-7 s, 2 allocations
[ Info:  Linear Solve 
[ Info:   0.083073424 s, 596550 allocations
[ Info:  Vec Vec add 
[ Info:   0.000407006 s, 2600 allocations
[ Info: ++++++++++++++++++
  3.877045 seconds (8.04 M allocations: 888.491 MiB, 3.14% gc time, 41.48% compilation time)
[ Info:    iterations    = 8

The log only print the single test of the residual evaluate with Floats and with the AffExpr once to not clutter, but the pattern continues after the first iteration.

Possible Diagnosis

  • I think that Ipopt is not using my gradient. It should be printing “Hello” but is not. I would like to let the solver use my analytic gradient.
  • I think Ipopt is trying to do automatic differentiation and thus is creating overhead.

What do you think?

Alternatives to Ipopt?

Returning to the comparison of NLS solvers, it seems like NLLSsolver.jl is currently the most efficient / lightweight option, but the lack of documentation and possible lack of future maintenance of the code base makes me hesitate to use it.

I tried to use the JSOSolver.jl package but was having a hard time passing my user defined functions to the solver. Any help here would be appreciated as well.

Are there any other options you suggest

1 Like

Does this really work and not throw an error?

You have multiple things that should be erroring:

  • The output must be a scalar, we do not support vector-valued operators
  • res_AffExpr accepts a vector as input, when it should accept the splatted syntax
  • jacRes_AffExpr returns the gradient, instead of filling in-place.

I don’t think your code is calling the JuMP part at all.

You might want to read Nonlinear Modeling · JuMP

Hopefully, I am addressing your comments @odow.

Am I Really Using JuMP?

I ran the exact script provided in a fresh REPL and the log in the original post shows the results. When I exclude JuMP from my using statements, it shows the error that @variable not defined. So I do believe that I am using the JuMP interface. Here is my package environment, and I do believe I am using the latest stable release of all packages:

(WENDy) pkg> st
Project NLPMLE v1.0.0-DEV
Status `~/.julia/dev/WENDy/Project.toml`
  [fbb218c0] BSON v0.3.9
  [0a1fb500] BlockDiagonals v0.1.42
  [7a1cc6ca] FFTW v1.8.0
  [6a3955dd] ImageFiltering v0.7.8
  [a98d9a8b] Interpolations v0.15.1
  [b964fa9f] LaTeXStrings v1.3.1
  [23992714] MAT v0.10.7
  [33e6dc65] MKL v0.6.3
  [961ee093] ModelingToolkit v9.17.1
  [1dea7af3] OrdinaryDiffEq v6.82.0
  [f0f68f2c] PlotlyJS v0.18.13
  [91a5bcdd] Plots v1.40.4
  [295af30f] Revise v3.5.14
  [90137ffa] StaticArrays v1.9.5
  [0c5d862f] Symbolics v5.30.3
  [b8865327] UnicodePlots v3.6.4
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [9a3f8284] Random
  [10745b16] Statistics v1.10.0

Using NonLinear Modeling Interface

I did read through the Nonlinear Operators documentation several times, but it was not clear how to best proceed because depending on the situation will have:

  • residual of different length
  • variables of different length

Point taken though, I can try to make several nonlinear operators with scalar outputs that have gradients rather than a jacobian.

I saw that [Tips and tricks · JuMP] suggests using a memorized function to help with operators with multiple outputs. So I attempt to use this work around. It is still adding allocations and run time which is frustrating.

Also, I found this post that shows how to build. The post is old from 2020 and suggests building an expression which matches the legacy syntax approach this problem. I used the legacy syntax, and now have faster code, but there is still considerable overhead from the memory function for the gradient computations.

@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt,  Tullio, BenchmarkTools
## 
# Set size of problem 
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10 
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise 
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
nothing
## Define necessary functions 
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
    fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3] 
    fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2] 
    fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
    jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
    jfm[2] = 2*w[6]*um[1]
    jfm[3] = w[8]
    jfm[4] = w[1]
    jfm[5] = w[7]
    jfm[6] = 0
    jfm[7] = w[4]
    jfm[8] = 0
    jfm[9] = w[10]
    nothing 
end 
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
    jfm .= 0
    jfm[1] = um[2]
    jfm[4] = (^)(um[1], 3)
    jfm[7] = (^)(um[1], 2)
    jfm[10] = um[3]
    jfm[14] = 1
    jfm[17] = (^)(um[1], 2)
    jfm[20] = um[2]
    jfm[24] = um[1]
    jfm[27] = 1
    jfm[30] = um[3]
    nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
    # Preallocate for L 
    D, M = size(U)
    K, _ = size(V)
    JuF = zeros(D,D,M)
    _L0 = zeros(K,D,D,M)
    _L1  = zeros(K,D,M,D)
    # Get _L1
    K,D,M,_ = size(_L1)
    @inbounds for m in 1:M
        jacuf!(view(JuF,:,:,m), w, view(U,:,m))
    end
    @tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
    @tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
    permutedims!(_L1,_L0,(1,2,4,3))
    L = reshape(_L1,K*D,M*D)
    # compute covariance
    S = L * L'
    # regularize for possible ill conditioning
    R = (1-diag_reg)*S + diag_reg*I
    # compute cholesky for lin solv efficiency
    cholesky!(Symmetric(R))
    return UpperTriangular(R)'
end
# Define residual function 
function res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        @info " Evaluate F "
        dt = @elapsed a = @allocations begin 
            F = zeros(W, D, M)
            for m in 1:size(F,2)
                f!(view(F,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Mat Mat mult "
        dt = @elapsed a = @allocations G = V * F'
        @info "  $dt s, $a allocations"
        @info " Reshape "
        dt = @elapsed a = @allocations g = reshape(G, K*D)
        @info "  $dt s, $a allocations"
        @info " Linear Solve "
        dt = @elapsed a = @allocations res = RT \ g
        @info "  $dt s, $a allocations"
        @info " Vec Vec add "
        dt = @elapsed a = @allocations res .-= b
        @info "  $dt s, $a allocations"
        @info "++++++++++++++++++"
        return res
    end
end
# Define Jacobian of residual with respect to the parameters w 
function ∇res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Jac Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        J = length(w)
        @info " Evaluating jacobian of F"
        dt = @elapsed a = @allocations begin
            JwF = zeros(W,D,J,M)
            for m in 1:M
                jacwf!(view(JwF,:,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Computing V ×_3 jacF with tullio"
        dt = @elapsed a = @allocations @tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m] 
        @info "  $dt s, $a allocations"
        @info " permutedims"
        dt = @elapsed a = @allocations JG = permutedims(_JG,(3,1,2))
        @info "  $dt s, $a allocations"
        @info " Reshape"
        dt = @elapsed a = @allocations jacG = reshape(JG, K*D, J)
        @info "  $dt s, $a allocations"
        @info " Linsolve"
        dt = @elapsed a = @allocations ∇res = RT \ jacG
        @info "  $dt s, $a allocations"
        return ∇res
    end
end
function memorize_f(f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
    _w, _f = NTuple{J,W}(zeros(W,J)), zeros(W,J)
    function f_kd(kd::Int, wargs::W...)
        if wargs !== _w
                _w, _f = wargs, f(wargs...)
        end
        return  _f[kd]::W 
    end
    return [(wargs...) -> f_kd(kd, wargs...) for kd in 1:KD]
end
 
function memorize_∇f!(∇f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
    _w, _∇f = NTuple{J,W}(zeros(W,J)), zeros(W,KD,J)
    function ∇f_kd!(ret::AbstractVector{W}, kd::Int, wargs::W...)
        if wargs !== _w
            _w = wargs
            _∇f = ∇f(wargs...)
        end
        ret .= view(_∇f, kd, :)  
        nothing 
    end
    return [(∇f_kd, wargs...) -> ∇f_kd!(∇f_kd, kd, wargs...) for kd in 1:KD]
end
#
res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = res(RT,U,V,b,f!,w;ll=ll)
∇res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(RT,U,V,jacwf!,w;ll=ll)
res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = res(collect(wargs);ll=ll)
∇res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(collect(wargs);ll=ll)
res_kd = memorize_f(res, K*D, Val(J), Val(eltype(U)))
∇res_kd = memorize_∇f!(∇res, K*D, Val(J), Val(eltype(U)))
res_mem(wargs::W...) where W = reduce(vcat, res_kd[kd](wargs...) for kd in 1:K*D)
function ∇res_mem!(∇res::AbstractMatrix{W}, wargs::W...) where W
    for kd in 1:K*D
        ∇res_kd[kd](view(∇res,kd,:), wargs...) 
    end
    nothing
end 
## 
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0 
# because this problem is linear in w the jacobian is constant, 
# and we could solve this problem with backslash because 
# it is really a linear least squares problem
w_star = ∇res(zeros(J)) \ b 
##
res_w_rand = res(w_rand)
res_w_rand_splat = res(w_rand...)
res_mem_w_rand = res_mem(w_rand...)
@assert all(res_w_rand .== res_w_rand_splat)
@assert all(res_w_rand .== res_mem_w_rand)
∇res_w_rand = ∇res(w_rand)
∇res_w_rand_splat = ∇res(w_rand...) 
∇res_mem_w_rand = zeros(K*D,J)
∇res_mem!(∇res_mem_w_rand,w_rand...)
@assert all(∇res_w_rand .== ∇res_w_rand_splat)
@assert all(∇res_w_rand .== ∇res_mem_w_rand)
nothing 
##
@info " Benchmark res"
@time res(rand(J))
@info " Benchmark res splatting"
@time res(rand(J)...)
@info " Benchmark res memorized"
@time res_mem(rand(J)...)
@info " Benchmark ∇res "
@time ∇res(rand(J))
@info " Benchmark ∇res splatting"
@time ∇res(rand(J)...) 
@info " Benchmark ∇res memorized"
∇res_mem_w_rand = zeros(K*D,J)
@time ∇res_mem!(∇res_mem_w_rand,rand(J)...);
nothing
##
@info "Test the NLS in isolation"
@info "Defining model for Ipopt"
mdl = Model(Ipopt.Optimizer)
@variable(mdl, w[j = 1:J], start = w_rand[j])
@variable(mdl, r[k = 1:K*D])
for kd in 1:K*D
    fk = Symbol("f$kd")
    register(mdl, fk, J, res_kd[kd], ∇res_kd[kd];autodiff=false)
    add_nonlinear_constraint(mdl, :($(fk)($(w)...) == $(r[kd])) )
end
@objective(mdl, Min, sum(r .^2 ) ) 
set_silent(mdl)
@info "Running optimizer"
@time optimize!(mdl)
w_hat = value.(w)
abserr = norm(w_hat - w_star)
relerr = abserr / norm(w_star)
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
obj_star = sum(res(w_star...).^2)
abserr = abs(objective_value(mdl) - obj_star)
relerr = abserr / obj_star
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"

With log

[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info:  Benchmark res
  0.000531 seconds (14 allocations: 29.359 KiB)
[ Info:  Benchmark res splatting
  0.000415 seconds (37 allocations: 29.922 KiB)
[ Info:  Benchmark res memorized
  0.000937 seconds (8.14 k allocations: 593.938 KiB)
[ Info:  Benchmark ∇res 
  0.002232 seconds (65 allocations: 308.188 KiB)
[ Info:  Benchmark ∇res splatting
  0.001899 seconds (88 allocations: 308.750 KiB)
[ Info:  Benchmark ∇res memorized
  0.003405 seconds (10.59 k allocations: 613.578 KiB)

[ Info: Test the NLS in isolation
[ Info: Defining model for Ipopt
[ Info: Running optimizer
  0.314099 seconds (1.02 M allocations: 38.915 MiB, 4.15% gc time)
[ Info: Absolute coeff error = 8.351175528130465e-10
[ Info: Relative coeff error = 1.8051495629515027e-10
[ Info: Absolute coeff error = 0.0
[ Info: Relative coeff error = 0.0

Questions

  • When I tried to use the new @operator and @constraint functions, I was getting errors deep in the MathOptInterface. How would you write equivalent code using the updated syntax?
  • This is faster than it was originally and it has fewer allocations, but is still slower than I would have hoped. Why is this the case? Is there more to be done to optimize the memory functions? It really seems like a bummer to have to use a hack like this to solve problem a “large” residual and variables. In the big scheme of things hundreds of variables is not large, and this will scale poorly if the problem size gets larger.

I don’t think JuMP and user-defined operators are the right choice for this problem.

See Should you use JuMP? · JuMP

You cannot register vector-valued user-defined operators, and you cannot register operators with a variable number of arguments.

2 Likes

Sorry for wasting your time @odow, I think that I was using the wrong tool for the job. I was looking for something more like Optim.jl.

If someone finds this post and wants to see how I used Optim for this particular example:

@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using Optim, Tullio, BenchmarkTools
using Optim: minimizer, minimum, iterations, iteration_limit_reached
## 
# Set size of problem 
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10 
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise 
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
nothing
## Define necessary functions 
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
    fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3] 
    fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2] 
    fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
    jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
    jfm[2] = 2*w[6]*um[1]
    jfm[3] = w[8]
    jfm[4] = w[1]
    jfm[5] = w[7]
    jfm[6] = 0
    jfm[7] = w[4]
    jfm[8] = 0
    jfm[9] = w[10]
    nothing 
end 
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
    jfm .= 0
    jfm[1] = um[2]
    jfm[4] = (^)(um[1], 3)
    jfm[7] = (^)(um[1], 2)
    jfm[10] = um[3]
    jfm[14] = 1
    jfm[17] = (^)(um[1], 2)
    jfm[20] = um[2]
    jfm[24] = um[1]
    jfm[27] = 1
    jfm[30] = um[3]
    nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
    # Preallocate for L 
    D, M = size(U)
    K, _ = size(V)
    JuF = zeros(D,D,M)
    _L0 = zeros(K,D,D,M)
    _L1  = zeros(K,D,M,D)
    # Get _L1
    K,D,M,_ = size(_L1)
    @inbounds for m in 1:M
        jacuf!(view(JuF,:,:,m), w, view(U,:,m))
    end
    @tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
    @tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
    permutedims!(_L1,_L0,(1,2,4,3))
    L = reshape(_L1,K*D,M*D)
    # compute covariance
    S = L * L'
    # regularize for possible ill conditioning
    R = (1-diag_reg)*S + diag_reg*I
    # compute cholesky for lin solv efficiency
    cholesky!(Symmetric(R))
    return UpperTriangular(R)'
end
# 
function res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        @info " Evaluate F "
        dt = @elapsed a = @allocations begin 
            F = zeros(W, D, M)
            for m in 1:size(F,2)
                f!(view(F,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Mat Mat mult "
        dt = @elapsed a = @allocations G = V * F'
        @info "  $dt s, $a allocations"
        @info " Reshape "
        dt = @elapsed a = @allocations g = reshape(G, K*D)
        @info "  $dt s, $a allocations"
        @info " Linear Solve "
        dt = @elapsed a = @allocations res = RT \ g
        @info "  $dt s, $a allocations"
        @info " Vec Vec add "
        dt = @elapsed a = @allocations res .-= b
        @info "  $dt s, $a allocations"
        @info "++++++++++++++++++"
        return res
    end
end

function ∇res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Jac Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        J = length(w)
        @info " Evaluating jacobian of F"
        dt = @elapsed a = @allocations begin
            JwF = zeros(W,D,J,M)
            for m in 1:M
                jacwf!(view(JwF,:,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Computing V ×_3 jacF with tullio"
        dt = @elapsed a = @allocations @tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m] 
        @info "  $dt s, $a allocations"
        @info " permutedims"
        dt = @elapsed a = @allocations JG = permutedims(_JG,(3,1,2))
        @info "  $dt s, $a allocations"
        @info " Reshape"
        dt = @elapsed a = @allocations jacG = reshape(JG, K*D, J)
        @info "  $dt s, $a allocations"
        @info " Linsolve"
        dt = @elapsed a = @allocations ∇res = RT \ jacG
        @info "  $dt s, $a allocations"
        return ∇res
    end
end
# Define cost function 
function f(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    1/2 * norm(res(RT, U, V, b, f!, w; ll=ll))^2
end

function ∇f!(∇f::AbstractVector, RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, jacwf!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    ∇f .= ∇res(RT,U,V,jacwf!,w;ll=ll)' * res(RT, U, V, b, f!, w; ll=ll) 
    nothing
end
#
f(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = f(RT,U,V,b,f!,w;ll=ll)
∇f!(∇f::AbstractVector, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = ∇f!(∇f,RT,U,V,b,jacwf!,w;ll=ll)
## 
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0 
# because this problem is linear in w the jacobian is constant, 
# and we could solve this problem with backslash because 
# it is really a linear least squares problem
G = ∇res(RT,U,V,jacwf!,zeros(J))
w_star = G \ b 
##
res_w_rand = f(w_rand)
∇f_rand = zeros(J)
∇f!(∇f_rand,w_rand)
@assert !all(∇f_rand .== 0)
nothing 
##
@info " Benchmark f"
@time f(rand(J))
@info " Benchmark f! "
@time ∇f!(∇f_rand,rand(J))
nothing
##
@info "Test the NLS in isolation"
@time result = optimize(f, ∇f!, w_rand, LBFGS())
w_hat = minimizer(result)
f_hat = minimum(result)
iter = iterations(result)
iterFlag = iteration_limit_reached(result)
abserr = norm(w_star-w_hat)
relerr = abserr / norm(w_star)
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
abserr = abs(f(w_star)- f_hat)
relerr = abs(f(w_star)- f_hat) / f(w_star)
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
## Compute the non lin least square solution 
@info "Defining IRWLS_Nonlinear..."
function IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Info,maxIt=100, relTol=1e-10)
    with_logger(ConsoleLogger(stderr,ll)) do 
        @info "Initializing the linearization least squares solution  ..."
        D, M = size(U)
        K, _ = size(V)
        G0 = ∇res(Matrix{Float64}(I, K*D,K*D), U, V, jacwf!, zeros(J))
        w0 = G0 \ b0 
        wit = zeros(J,maxIt)
        resit = zeros(J,maxIt)
        wnm1 = w0 
        wn = similar(w0)
        for n = 1:maxIt 
            RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,wnm1)
            b = RT \ b0 
            G = ∇res(RT,U,V,jacwf!,zeros(J))
            w_star = G \ b 
            fn(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = f(RT,U,V,b,f!,w;ll=ll)
            ∇fn!(∇f::AbstractVector, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = ∇f!(∇f,RT,U,V,b,jacwf!,w;ll=ll)
            @info "Running local optimization method"
            dt = @elapsed a = @allocations result = optimize(fn, ∇fn!, w_rand, LBFGS())
            wn = minimizer(result)
            resn = wnm1-wn
            resit[:,n] = resn
            wit[:,n] = wn
            relErr = norm(w_star-wn) / norm(wn)
            @info """ Iteration $n
                iteration time     = $dt
                allocations        = $dt
                relative Error     = $relErr
                iterations         = $(iterations(result))
                max iteration flag = $(iteration_limit_reached(result))
                objective_value    = $(minimum(result))
            """
            if norm(resn) / norm(wnm1) < relTol
                resit = resit[:,1:n] 
                wit = wit[:,1:n] 
                @info "Convergence Criterion met!"
                return wn, wit, resit 
            end
            wnm1 = wn
        end
        @warn "Maxiteration met..."
        return wn, wit, resit 
    end
end 
##
@info "IRWLS (Nonlinear): "
@info "   Runtime info: "
@time what, wit, resit = IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Warn)
@info "   iterations    = $(size(wit,2)-1)"

with log

[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info:  Benchmark f
  0.000380 seconds (17 allocations: 29.406 KiB)
[ Info:  Benchmark f! 
  0.002697 seconds (81 allocations: 337.578 KiB)
[ Info: Test the NLS in isolation
  0.089266 seconds (3.30 k allocations: 11.473 MiB)
[ Info: Absolute coeff error = 6.482483884281083e-14
[ Info: Relative coeff error = 1.401222248428754e-14
[ Info: Absolute coeff error = 5.684341886080802e-14
[ Info: Relative coeff error = 3.614986906024396e-16
[ Info: Defining IRWLS_Nonlinear...
[ Info: IRWLS (Nonlinear): 
[ Info:    Runtime info: 
  2.014414 seconds (72.40 k allocations: 569.139 MiB, 2.54% gc time)
[ Info:    iterations    = 19
1 Like

Questions here are never a waste :smile:

I should think about how we can better commuicate when JuMP is the wrong tool for the job. Questions similar to yours come up semi-regularly, which means many more people must be having the same issue and not reporting it.

1 Like

Thanks for your help! I have used JuMP before for problems that better fit the use case about 3 years ago and thought of it first when I thought about implementing this algorithm in Julia. It is a lesson for me to be more thoughtful during the planning phase of algorithm development. Overall, I thought the documentation you provided for JuMP was well written and concise. Thanks for what you do.

1 Like

Did you try NonlinearSolve’s NonlinearLeastSquaresProblem? Optim is pretty expensive for this kind of problem.

2 Likes

Because there is noise in the system, and because we are truncating a Taylor series in order to maintain Gaussianity of the distribution of the residual, I don’t thing we will have equality. So I think I am looking to minimize a norm of the residual. It appears NonlinearSolve is for finding equality (a root) of a nonlinear system of equations. Am I interpreting this wrong? Can NonlinearSolve be used to find a minimum with respect some cost function (norm)?

If anything I think I am going to have to go for something more heavy duty like NLopt through Optimization.jl to handle problems where simple local optimization methods are insufficient (like BFGS).

Yes, you are interpreting it wrong. Nonlinear solving a system of equations with more/less equations than unknowns can be interpreted as a least squares problem. If you have a least squares problem then a general nonlinear optimizer is heavy. Least squares problems can be reduced to Newton methods with QR factorizations, so you get the quadratic convergence with first order derivatives rather than requiring Hessians (second derivatives) for a purely nonlinear optimization routine via Newton methods, or first order convergence for gradient-based methods, other than BFGS which is then just a Broyden method (so nonlinear solver) similarly coerced to nonlinear optimization.

So basically, lots of nonlinear optimization routines are made via nonlinear optimizers, but there’s specializations you can do if it’s just a nonlinear least square problem.

So I could say use Optimization.jl, but since the OP specifically noted that the problem is NLS, you should use NonlinearSolve.jl. It’s well-benchmarked and the performance is understood, see:

4 Likes