# Overview

An algorithm I am working on requires solving many largish nonlinear least squares problems. The number of allocations is exploding. My nonlinear function are efficient when passed Arrays of `Float64`

and generate few allocations, but when they are passed as an operator to Ipopt and are evaluated with `VariableRef`

or `AffExp`

then the evaluation time and number of allocations exploded.

I can provide an analytic gradient which is also efficient to compute with `Float64`

’s, so automatic differentiation should not be necessary. But I do not see my gradient function being evaluated. I think this is part of why the code is so slow.

I am looking for advice on how to move forward to make solving non-linear least squares problems in Julia as fast as possible.

I should mention I am using Julia v1.10.4 on an intel Mac.

## Background

You can probably skip this if you don’t care about the motivation and just want to see the coding problem.

I am trying to solve this problem many times because it arises from an iterative re-weighted non-linear least squares problem:

\min_w \tfrac{1}{2} \big\|(R^T)^{-1}(G(w;u)-b)\big\|_2^2.

This comes from trying to approximate the parameters of a differential equation of the form

\dot{u(t)} = F(u(t),w)

where u(t) are a dependent variable(s) (possibly a vector), and w are parameters. I am extending the work WENDy Algorithm to handle ODE’s (and then PDE’s) that are nonlinear in parameters (ie the rhs F is nonlinear in w).

## Code and Log

I am following the example shown in the Nonlinear Least Squares Feature Comparison, but something is not working quite right.

I built relatively efficient ways to evaluate r(w) := (R^T)^{-1} (G(w;u) - b) and \nabla_u r(w). When I benchmark these function when I use `w::Vector{Float64}`

then they are very ten’s of allocations, but when I register these functions to an operator for a `Model`

then they have thousands of allocations (see the log output).

I am attaching code with random data to isolate the issue, but in practice the `U`

and `V`

and `sig`

variables are all computed upstream. The `f!`

, `jacuf!`

, and `jacwf!`

are from the Hindmarsh-Rose Equation which has a rhs F that is linear in w, but I am using this as a way to check that the NLS solver is working. These functions are all generated by `ModelingToolkit.jl`

, but I wrote them here in a more human readable format. Here is the code:

```
@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt, Tullio, BenchmarkTools
##
# Set size of problem
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
## Define necessary functions
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3]
fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2]
fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
jfm[2] = 2*w[6]*um[1]
jfm[3] = w[8]
jfm[4] = w[1]
jfm[5] = w[7]
jfm[6] = 0
jfm[7] = w[4]
jfm[8] = 0
jfm[9] = w[10]
nothing
end
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
jfm .= 0
jfm[1] = um[2]
jfm[4] = (^)(um[1], 3)
jfm[7] = (^)(um[1], 2)
jfm[10] = um[3]
jfm[14] = 1
jfm[17] = (^)(um[1], 2)
jfm[20] = um[2]
jfm[24] = um[1]
jfm[27] = 1
jfm[30] = um[3]
nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
# Preallocate for L
D, M = size(U)
K, _ = size(V)
JuF = zeros(D,D,M)
_L0 = zeros(K,D,D,M)
_L1 = zeros(K,D,M,D)
# Get _L1
K,D,M,_ = size(_L1)
@inbounds for m in 1:size(JuF,3)
jacuf!(view(JuF,:,:,m), w, view(U,:,m))
end
@tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
@tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
permutedims!(_L1,_L0,(1,2,4,3))
L = reshape(_L1,K*D,M*D)
# compute covariance
S = L * L'
# regularize for possible ill conditioning
R = (1-diag_reg)*S + diag_reg*I
# compute cholesky for lin solv efficiency
cholesky!(Symmetric(R))
return UpperTriangular(R)'
end
# Define residual function
function _res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, ::Val{T}, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where {W,T}
with_logger(ConsoleLogger(stderr, ll)) do
K, M = size(V)
D, _ = size(U)
@info "++++ Res Eval ++++"
@info " Evaluate F "
dt = @elapsed a = @allocations begin
F = zeros(T, D, M)
for m in 1:size(F,2)
f!(view(F,:,m), w, view(U,:,m))
end
end
@info " $dt s, $a allocations"
@info " Mat Mat mult "
dt = @elapsed a = @allocations G = V * F'
@info " $dt s, $a allocations"
@info " Reshape "
dt = @elapsed a = @allocations g = reshape(G, K*D)
@info " $dt s, $a allocations"
@info " Linear Solve "
dt = @elapsed a = @allocations res = RT \ g
@info " $dt s, $a allocations"
@info " Vec Vec add "
dt = @elapsed a = @allocations res .-= b
@info " $dt s, $a allocations"
@info "++++++++++++++++++"
return res
end
end
# Define Jacobian of residual with respect to the parameters w
function _jacRes(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, ::Val{T}, w::AbstractVector{W}; showFlag::Bool=false, ll::Logging.LogLevel=Logging.Warn) where {W,T}
with_logger(ConsoleLogger(stderr, ll)) do
showFlag && println(stderr, "Hello")
K, M = size(V)
D, _ = size(U)
J = length(w)
JwF = zeros(T,D,J,M)
for m in 1:M
jacwf!(view(JwF,:,:,m), w, view(U,:,m))
end
@tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m]
JG = permutedims(_JG,(3,1,2))
jacG = reshape(JG, K*D, J)
return RT \ jacG
end
end
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0
res_float64(w, ll) = _res(RT,U,V,b,f!,Val(Float64), w; ll=ll)
res_float64(w_rand, Logging.Info)
@btime res_float64($w_rand, $Logging.Warn);
nothing
## Compute the non lin least square solution
@info "Defining IRWLS_Nonlinear..."
function IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Info,maxIt=100, relTol=1e-4)
with_logger(ConsoleLogger(stderr,ll)) do
@info "Initializing the linearization least squares solution ..."
D, M = size(U)
K, _ = size(V)
G0 = _jacRes(Matrix{Float64}(I, K*D,K*D), U, V, jacwf!, Val(Float64), zeros(J))
w0 = G0 \ b0
wit = zeros(J,maxIt)
resit = zeros(J,maxIt)
wnm1 = w0
wn = similar(w0)
for n = 1:maxIt
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,wnm1)
b = RT \ b0
ll = (n == 1) ? Logging.Info : Logging.Warn
res_AffExpr(w::AbstractVector) = _res(RT,U,V,b,f!,Val(AffExpr), w; ll=ll)
jacRes_AffExpr(w::AbstractVector) = _jacRes(RT,U,V,jacwf!,Val(AffExpr),w;showFlag=true)
w_star = _jacRes(RT, U, V, jacwf!, Val(Float64), zeros(J)) \ b
##
@info "Defining model for Ipopt"
mdl = Model(Ipopt.Optimizer)
@variable(mdl, w[j = 1:J], start = wnm1[j])
@variable(mdl, r[k = 1:K*D])
@operator(mdl, f, J, res_AffExpr, jacRes_AffExpr)
@constraint(mdl, r == f(w))
@objective(mdl, Min, sum(r.^2 ) )
set_silent(mdl)
@info "Running optimizer"
dtn = @elapsed optimize!(mdl)
wn = value.(w)
resn = wn - w_star
relErr = norm(resn) / norm(w_star)
resit[:,n] = resn
wit[:,n] = wn
@info """ Iteration $n
solve time = $dtn s
relative Error = $relErr
barrier_iterations = $(barrier_iterations(mdl))
termination_status = $(termination_status(mdl))
primal_status = $(primal_status(mdl))
objective_value = $(objective_value(mdl))
"""
if norm(wnm1 - wn) / norm(wnm1) < relTol
resit = resit[:,1:n]
wit = wit[:,1:n]
@info "Convergence Criterion met!"
return wn, wit, resit
end
wnm1 = wn
end
@warn "Maxiteration met..."
return wn, wit, resit
end
end
##
@info "IRWLS (Nonlinear): "
@info " Runtime info: "
@time what, wit, resit = IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Warn)
@info " iterations = $(size(wit,2)-1)"
```

Here is the log:

```
[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info: ++++ Res Eval ++++
[ Info: Evaluate F
[ Info: 1.8738e-5 s, 2 allocations
[ Info: Mat Mat mult
[ Info: 0.00034676 s, 1 allocations
[ Info: Reshape
[ Info: 1.034e-6 s, 2 allocations
[ Info: Linear Solve
[ Info: 6.248e-5 s, 1 allocations
[ Info: Vec Vec add
[ Info: 1.201e-6 s, 0 allocations
[ Info: ++++++++++++++++++
223.049 μs (13 allocations: 29.22 KiB)
[ Info: Defining IRWLS_Nonlinear...
[ Info: IRWLS (Nonlinear):
[ Info: Runtime info:
[ Info: ++++ Res Eval ++++
[ Info: Evaluate F
[ Info: 0.006429002 s, 114007 allocations
[ Info: Mat Mat mult
[ Info: 0.126229552 s, 2401 allocations
[ Info: Reshape
[ Info: 9.96e-7 s, 2 allocations
[ Info: Linear Solve
[ Info: 0.083073424 s, 596550 allocations
[ Info: Vec Vec add
[ Info: 0.000407006 s, 2600 allocations
[ Info: ++++++++++++++++++
3.877045 seconds (8.04 M allocations: 888.491 MiB, 3.14% gc time, 41.48% compilation time)
[ Info: iterations = 8
```

The log only print the single test of the residual evaluate with `Floats`

and with the `AffExpr`

once to not clutter, but the pattern continues after the first iteration.

## Possible Diagnosis

- I think that Ipopt is not using my gradient. It should be printing “Hello” but is not. I would like to let the solver use my analytic gradient.
- I think Ipopt is trying to do automatic differentiation and thus is creating overhead.

**What do you think?**

# Alternatives to Ipopt?

Returning to the comparison of NLS solvers, it seems like NLLSsolver.jl is currently the most efficient / lightweight option, but the lack of documentation and possible lack of future maintenance of the code base makes me hesitate to use it.

I tried to use the JSOSolver.jl package but was having a hard time passing my user defined functions to the solver. Any help here would be appreciated as well.

**Are there any other options you suggest**