Overview
An algorithm I am working on requires solving many largish nonlinear least squares problems. The number of allocations is exploding. My nonlinear function are efficient when passed Arrays of Float64
and generate few allocations, but when they are passed as an operator to Ipopt and are evaluated with VariableRef
or AffExp
then the evaluation time and number of allocations exploded.
I can provide an analytic gradient which is also efficient to compute with Float64
’s, so automatic differentiation should not be necessary. But I do not see my gradient function being evaluated. I think this is part of why the code is so slow.
I am looking for advice on how to move forward to make solving non-linear least squares problems in Julia as fast as possible.
I should mention I am using Julia v1.10.4 on an intel Mac.
Background
You can probably skip this if you don’t care about the motivation and just want to see the coding problem.
I am trying to solve this problem many times because it arises from an iterative re-weighted non-linear least squares problem:
\min_w \tfrac{1}{2} \big\|(R^T)^{-1}(G(w;u)-b)\big\|_2^2.
This comes from trying to approximate the parameters of a differential equation of the form
\dot{u(t)} = F(u(t),w)
where u(t) are a dependent variable(s) (possibly a vector), and w are parameters. I am extending the work WENDy Algorithm to handle ODE’s (and then PDE’s) that are nonlinear in parameters (ie the rhs F is nonlinear in w).
Code and Log
I am following the example shown in the Nonlinear Least Squares Feature Comparison, but something is not working quite right.
I built relatively efficient ways to evaluate r(w) := (R^T)^{-1} (G(w;u) - b) and \nabla_u r(w). When I benchmark these function when I use w::Vector{Float64}
then they are very ten’s of allocations, but when I register these functions to an operator for a Model
then they have thousands of allocations (see the log output).
I am attaching code with random data to isolate the issue, but in practice the U
and V
and sig
variables are all computed upstream. The f!
, jacuf!
, and jacwf!
are from the Hindmarsh-Rose Equation which has a rhs F that is linear in w, but I am using this as a way to check that the NLS solver is working. These functions are all generated by ModelingToolkit.jl
, but I wrote them here in a more human readable format. Here is the code:
@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt, Tullio, BenchmarkTools
##
# Set size of problem
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
## Define necessary functions
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3]
fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2]
fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
jfm[2] = 2*w[6]*um[1]
jfm[3] = w[8]
jfm[4] = w[1]
jfm[5] = w[7]
jfm[6] = 0
jfm[7] = w[4]
jfm[8] = 0
jfm[9] = w[10]
nothing
end
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
jfm .= 0
jfm[1] = um[2]
jfm[4] = (^)(um[1], 3)
jfm[7] = (^)(um[1], 2)
jfm[10] = um[3]
jfm[14] = 1
jfm[17] = (^)(um[1], 2)
jfm[20] = um[2]
jfm[24] = um[1]
jfm[27] = 1
jfm[30] = um[3]
nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
# Preallocate for L
D, M = size(U)
K, _ = size(V)
JuF = zeros(D,D,M)
_L0 = zeros(K,D,D,M)
_L1 = zeros(K,D,M,D)
# Get _L1
K,D,M,_ = size(_L1)
@inbounds for m in 1:size(JuF,3)
jacuf!(view(JuF,:,:,m), w, view(U,:,m))
end
@tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
@tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
permutedims!(_L1,_L0,(1,2,4,3))
L = reshape(_L1,K*D,M*D)
# compute covariance
S = L * L'
# regularize for possible ill conditioning
R = (1-diag_reg)*S + diag_reg*I
# compute cholesky for lin solv efficiency
cholesky!(Symmetric(R))
return UpperTriangular(R)'
end
# Define residual function
function _res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, ::Val{T}, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where {W,T}
with_logger(ConsoleLogger(stderr, ll)) do
K, M = size(V)
D, _ = size(U)
@info "++++ Res Eval ++++"
@info " Evaluate F "
dt = @elapsed a = @allocations begin
F = zeros(T, D, M)
for m in 1:size(F,2)
f!(view(F,:,m), w, view(U,:,m))
end
end
@info " $dt s, $a allocations"
@info " Mat Mat mult "
dt = @elapsed a = @allocations G = V * F'
@info " $dt s, $a allocations"
@info " Reshape "
dt = @elapsed a = @allocations g = reshape(G, K*D)
@info " $dt s, $a allocations"
@info " Linear Solve "
dt = @elapsed a = @allocations res = RT \ g
@info " $dt s, $a allocations"
@info " Vec Vec add "
dt = @elapsed a = @allocations res .-= b
@info " $dt s, $a allocations"
@info "++++++++++++++++++"
return res
end
end
# Define Jacobian of residual with respect to the parameters w
function _jacRes(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, ::Val{T}, w::AbstractVector{W}; showFlag::Bool=false, ll::Logging.LogLevel=Logging.Warn) where {W,T}
with_logger(ConsoleLogger(stderr, ll)) do
showFlag && println(stderr, "Hello")
K, M = size(V)
D, _ = size(U)
J = length(w)
JwF = zeros(T,D,J,M)
for m in 1:M
jacwf!(view(JwF,:,:,m), w, view(U,:,m))
end
@tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m]
JG = permutedims(_JG,(3,1,2))
jacG = reshape(JG, K*D, J)
return RT \ jacG
end
end
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0
res_float64(w, ll) = _res(RT,U,V,b,f!,Val(Float64), w; ll=ll)
res_float64(w_rand, Logging.Info)
@btime res_float64($w_rand, $Logging.Warn);
nothing
## Compute the non lin least square solution
@info "Defining IRWLS_Nonlinear..."
function IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Info,maxIt=100, relTol=1e-4)
with_logger(ConsoleLogger(stderr,ll)) do
@info "Initializing the linearization least squares solution ..."
D, M = size(U)
K, _ = size(V)
G0 = _jacRes(Matrix{Float64}(I, K*D,K*D), U, V, jacwf!, Val(Float64), zeros(J))
w0 = G0 \ b0
wit = zeros(J,maxIt)
resit = zeros(J,maxIt)
wnm1 = w0
wn = similar(w0)
for n = 1:maxIt
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,wnm1)
b = RT \ b0
ll = (n == 1) ? Logging.Info : Logging.Warn
res_AffExpr(w::AbstractVector) = _res(RT,U,V,b,f!,Val(AffExpr), w; ll=ll)
jacRes_AffExpr(w::AbstractVector) = _jacRes(RT,U,V,jacwf!,Val(AffExpr),w;showFlag=true)
w_star = _jacRes(RT, U, V, jacwf!, Val(Float64), zeros(J)) \ b
##
@info "Defining model for Ipopt"
mdl = Model(Ipopt.Optimizer)
@variable(mdl, w[j = 1:J], start = wnm1[j])
@variable(mdl, r[k = 1:K*D])
@operator(mdl, f, J, res_AffExpr, jacRes_AffExpr)
@constraint(mdl, r == f(w))
@objective(mdl, Min, sum(r.^2 ) )
set_silent(mdl)
@info "Running optimizer"
dtn = @elapsed optimize!(mdl)
wn = value.(w)
resn = wn - w_star
relErr = norm(resn) / norm(w_star)
resit[:,n] = resn
wit[:,n] = wn
@info """ Iteration $n
solve time = $dtn s
relative Error = $relErr
barrier_iterations = $(barrier_iterations(mdl))
termination_status = $(termination_status(mdl))
primal_status = $(primal_status(mdl))
objective_value = $(objective_value(mdl))
"""
if norm(wnm1 - wn) / norm(wnm1) < relTol
resit = resit[:,1:n]
wit = wit[:,1:n]
@info "Convergence Criterion met!"
return wn, wit, resit
end
wnm1 = wn
end
@warn "Maxiteration met..."
return wn, wit, resit
end
end
##
@info "IRWLS (Nonlinear): "
@info " Runtime info: "
@time what, wit, resit = IRWLS_Nonlinear(U, V, Vp, b0, sig, diag_reg, J, f!, jacuf!, jacwf!; ll=Logging.Warn)
@info " iterations = $(size(wit,2)-1)"
Here is the log:
[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info: ++++ Res Eval ++++
[ Info: Evaluate F
[ Info: 1.8738e-5 s, 2 allocations
[ Info: Mat Mat mult
[ Info: 0.00034676 s, 1 allocations
[ Info: Reshape
[ Info: 1.034e-6 s, 2 allocations
[ Info: Linear Solve
[ Info: 6.248e-5 s, 1 allocations
[ Info: Vec Vec add
[ Info: 1.201e-6 s, 0 allocations
[ Info: ++++++++++++++++++
223.049 μs (13 allocations: 29.22 KiB)
[ Info: Defining IRWLS_Nonlinear...
[ Info: IRWLS (Nonlinear):
[ Info: Runtime info:
[ Info: ++++ Res Eval ++++
[ Info: Evaluate F
[ Info: 0.006429002 s, 114007 allocations
[ Info: Mat Mat mult
[ Info: 0.126229552 s, 2401 allocations
[ Info: Reshape
[ Info: 9.96e-7 s, 2 allocations
[ Info: Linear Solve
[ Info: 0.083073424 s, 596550 allocations
[ Info: Vec Vec add
[ Info: 0.000407006 s, 2600 allocations
[ Info: ++++++++++++++++++
3.877045 seconds (8.04 M allocations: 888.491 MiB, 3.14% gc time, 41.48% compilation time)
[ Info: iterations = 8
The log only print the single test of the residual evaluate with Floats
and with the AffExpr
once to not clutter, but the pattern continues after the first iteration.
Possible Diagnosis
- I think that Ipopt is not using my gradient. It should be printing “Hello” but is not. I would like to let the solver use my analytic gradient.
- I think Ipopt is trying to do automatic differentiation and thus is creating overhead.
What do you think?
Alternatives to Ipopt?
Returning to the comparison of NLS solvers, it seems like NLLSsolver.jl is currently the most efficient / lightweight option, but the lack of documentation and possible lack of future maintenance of the code base makes me hesitate to use it.
I tried to use the JSOSolver.jl package but was having a hard time passing my user defined functions to the solver. Any help here would be appreciated as well.
Are there any other options you suggest