Nonlinear Least Squares Millions of Allocations

nicrummel · June 13, 2024, 5:32pm

Hopefully, I am addressing your comments @odow.

Am I Really Using JuMP?

I ran the exact script provided in a fresh REPL and the log in the original post shows the results. When I exclude JuMP from my using statements, it shows the error that @variable not defined. So I do believe that I am using the JuMP interface. Here is my package environment, and I do believe I am using the latest stable release of all packages:

(WENDy) pkg> st
Project NLPMLE v1.0.0-DEV
Status `~/.julia/dev/WENDy/Project.toml`
  [fbb218c0] BSON v0.3.9
  [0a1fb500] BlockDiagonals v0.1.42
  [7a1cc6ca] FFTW v1.8.0
  [6a3955dd] ImageFiltering v0.7.8
  [a98d9a8b] Interpolations v0.15.1
  [b964fa9f] LaTeXStrings v1.3.1
  [23992714] MAT v0.10.7
  [33e6dc65] MKL v0.6.3
  [961ee093] ModelingToolkit v9.17.1
  [1dea7af3] OrdinaryDiffEq v6.82.0
  [f0f68f2c] PlotlyJS v0.18.13
  [91a5bcdd] Plots v1.40.4
  [295af30f] Revise v3.5.14
  [90137ffa] StaticArrays v1.9.5
  [0c5d862f] Symbolics v5.30.3
  [b8865327] UnicodePlots v3.6.4
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [9a3f8284] Random
  [10745b16] Statistics v1.10.0

Using NonLinear Modeling Interface

I did read through the Nonlinear Operators documentation several times, but it was not clear how to best proceed because depending on the situation will have:

residual of different length
variables of different length

Point taken though, I can try to make several nonlinear operators with scalar outputs that have gradients rather than a jacobian.

I saw that [Tips and tricks · JuMP] suggests using a memorized function to help with operators with multiple outputs. So I attempt to use this work around. It is still adding allocations and run time which is frustrating.

Also, I found this post that shows how to build. The post is old from 2020 and suggests building an expression which matches the legacy syntax approach this problem. I used the legacy syntax, and now have faster code, but there is still considerable overhead from the memory function for the gradient computations.

@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt,  Tullio, BenchmarkTools
## 
# Set size of problem 
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10 
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise 
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
nothing
## Define necessary functions 
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
    fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3] 
    fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2] 
    fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
    jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
    jfm[2] = 2*w[6]*um[1]
    jfm[3] = w[8]
    jfm[4] = w[1]
    jfm[5] = w[7]
    jfm[6] = 0
    jfm[7] = w[4]
    jfm[8] = 0
    jfm[9] = w[10]
    nothing 
end 
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
    jfm .= 0
    jfm[1] = um[2]
    jfm[4] = (^)(um[1], 3)
    jfm[7] = (^)(um[1], 2)
    jfm[10] = um[3]
    jfm[14] = 1
    jfm[17] = (^)(um[1], 2)
    jfm[20] = um[2]
    jfm[24] = um[1]
    jfm[27] = 1
    jfm[30] = um[3]
    nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
    # Preallocate for L 
    D, M = size(U)
    K, _ = size(V)
    JuF = zeros(D,D,M)
    _L0 = zeros(K,D,D,M)
    _L1  = zeros(K,D,M,D)
    # Get _L1
    K,D,M,_ = size(_L1)
    @inbounds for m in 1:M
        jacuf!(view(JuF,:,:,m), w, view(U,:,m))
    end
    @tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
    @tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
    permutedims!(_L1,_L0,(1,2,4,3))
    L = reshape(_L1,K*D,M*D)
    # compute covariance
    S = L * L'
    # regularize for possible ill conditioning
    R = (1-diag_reg)*S + diag_reg*I
    # compute cholesky for lin solv efficiency
    cholesky!(Symmetric(R))
    return UpperTriangular(R)'
end
# Define residual function 
function res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        @info " Evaluate F "
        dt = @elapsed a = @allocations begin 
            F = zeros(W, D, M)
            for m in 1:size(F,2)
                f!(view(F,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Mat Mat mult "
        dt = @elapsed a = @allocations G = V * F'
        @info "  $dt s, $a allocations"
        @info " Reshape "
        dt = @elapsed a = @allocations g = reshape(G, K*D)
        @info "  $dt s, $a allocations"
        @info " Linear Solve "
        dt = @elapsed a = @allocations res = RT \ g
        @info "  $dt s, $a allocations"
        @info " Vec Vec add "
        dt = @elapsed a = @allocations res .-= b
        @info "  $dt s, $a allocations"
        @info "++++++++++++++++++"
        return res
    end
end
# Define Jacobian of residual with respect to the parameters w 
function ∇res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
    with_logger(ConsoleLogger(stderr, ll)) do 
        @info "++++ Jac Res Eval ++++"
        K, M = size(V)
        D, _ = size(U)
        J = length(w)
        @info " Evaluating jacobian of F"
        dt = @elapsed a = @allocations begin
            JwF = zeros(W,D,J,M)
            for m in 1:M
                jacwf!(view(JwF,:,:,m), w, view(U,:,m))
            end
        end
        @info "  $dt s, $a allocations"
        @info " Computing V ×_3 jacF with tullio"
        dt = @elapsed a = @allocations @tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m] 
        @info "  $dt s, $a allocations"
        @info " permutedims"
        dt = @elapsed a = @allocations JG = permutedims(_JG,(3,1,2))
        @info "  $dt s, $a allocations"
        @info " Reshape"
        dt = @elapsed a = @allocations jacG = reshape(JG, K*D, J)
        @info "  $dt s, $a allocations"
        @info " Linsolve"
        dt = @elapsed a = @allocations ∇res = RT \ jacG
        @info "  $dt s, $a allocations"
        return ∇res
    end
end
function memorize_f(f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
    _w, _f = NTuple{J,W}(zeros(W,J)), zeros(W,J)
    function f_kd(kd::Int, wargs::W...)
        if wargs !== _w
                _w, _f = wargs, f(wargs...)
        end
        return  _f[kd]::W 
    end
    return [(wargs...) -> f_kd(kd, wargs...) for kd in 1:KD]
end
 
function memorize_∇f!(∇f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
    _w, _∇f = NTuple{J,W}(zeros(W,J)), zeros(W,KD,J)
    function ∇f_kd!(ret::AbstractVector{W}, kd::Int, wargs::W...)
        if wargs !== _w
            _w = wargs
            _∇f = ∇f(wargs...)
        end
        ret .= view(_∇f, kd, :)  
        nothing 
    end
    return [(∇f_kd, wargs...) -> ∇f_kd!(∇f_kd, kd, wargs...) for kd in 1:KD]
end
#
res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = res(RT,U,V,b,f!,w;ll=ll)
∇res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(RT,U,V,jacwf!,w;ll=ll)
res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = res(collect(wargs);ll=ll)
∇res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(collect(wargs);ll=ll)
res_kd = memorize_f(res, K*D, Val(J), Val(eltype(U)))
∇res_kd = memorize_∇f!(∇res, K*D, Val(J), Val(eltype(U)))
res_mem(wargs::W...) where W = reduce(vcat, res_kd[kd](wargs...) for kd in 1:K*D)
function ∇res_mem!(∇res::AbstractMatrix{W}, wargs::W...) where W
    for kd in 1:K*D
        ∇res_kd[kd](view(∇res,kd,:), wargs...) 
    end
    nothing
end 
## 
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0 
# because this problem is linear in w the jacobian is constant, 
# and we could solve this problem with backslash because 
# it is really a linear least squares problem
w_star = ∇res(zeros(J)) \ b 
##
res_w_rand = res(w_rand)
res_w_rand_splat = res(w_rand...)
res_mem_w_rand = res_mem(w_rand...)
@assert all(res_w_rand .== res_w_rand_splat)
@assert all(res_w_rand .== res_mem_w_rand)
∇res_w_rand = ∇res(w_rand)
∇res_w_rand_splat = ∇res(w_rand...) 
∇res_mem_w_rand = zeros(K*D,J)
∇res_mem!(∇res_mem_w_rand,w_rand...)
@assert all(∇res_w_rand .== ∇res_w_rand_splat)
@assert all(∇res_w_rand .== ∇res_mem_w_rand)
nothing 
##
@info " Benchmark res"
@time res(rand(J))
@info " Benchmark res splatting"
@time res(rand(J)...)
@info " Benchmark res memorized"
@time res_mem(rand(J)...)
@info " Benchmark ∇res "
@time ∇res(rand(J))
@info " Benchmark ∇res splatting"
@time ∇res(rand(J)...) 
@info " Benchmark ∇res memorized"
∇res_mem_w_rand = zeros(K*D,J)
@time ∇res_mem!(∇res_mem_w_rand,rand(J)...);
nothing
##
@info "Test the NLS in isolation"
@info "Defining model for Ipopt"
mdl = Model(Ipopt.Optimizer)
@variable(mdl, w[j = 1:J], start = w_rand[j])
@variable(mdl, r[k = 1:K*D])
for kd in 1:K*D
    fk = Symbol("f$kd")
    register(mdl, fk, J, res_kd[kd], ∇res_kd[kd];autodiff=false)
    add_nonlinear_constraint(mdl, :($(fk)($(w)...) == $(r[kd])) )
end
@objective(mdl, Min, sum(r .^2 ) ) 
set_silent(mdl)
@info "Running optimizer"
@time optimize!(mdl)
w_hat = value.(w)
abserr = norm(w_hat - w_star)
relerr = abserr / norm(w_star)
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
obj_star = sum(res(w_star...).^2)
abserr = abs(objective_value(mdl) - obj_star)
relerr = abserr / obj_star
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"

With log

[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info:  Benchmark res
  0.000531 seconds (14 allocations: 29.359 KiB)
[ Info:  Benchmark res splatting
  0.000415 seconds (37 allocations: 29.922 KiB)
[ Info:  Benchmark res memorized
  0.000937 seconds (8.14 k allocations: 593.938 KiB)
[ Info:  Benchmark ∇res 
  0.002232 seconds (65 allocations: 308.188 KiB)
[ Info:  Benchmark ∇res splatting
  0.001899 seconds (88 allocations: 308.750 KiB)
[ Info:  Benchmark ∇res memorized
  0.003405 seconds (10.59 k allocations: 613.578 KiB)

[ Info: Test the NLS in isolation
[ Info: Defining model for Ipopt
[ Info: Running optimizer
  0.314099 seconds (1.02 M allocations: 38.915 MiB, 4.15% gc time)
[ Info: Absolute coeff error = 8.351175528130465e-10
[ Info: Relative coeff error = 1.8051495629515027e-10
[ Info: Absolute coeff error = 0.0
[ Info: Relative coeff error = 0.0

Questions

When I tried to use the new @operator and @constraint functions, I was getting errors deep in the MathOptInterface. How would you write equivalent code using the updated syntax?
This is faster than it was originally and it has fewer allocations, but is still slower than I would have hoped. Why is this the case? Is there more to be done to optimize the memory functions? It really seems like a bummer to have to use a hack like this to solve problem a “large” residual and variables. In the big scheme of things hundreds of variables is not large, and this will scale poorly if the problem size gets larger.

Topic		Replies	Views
Struggle at large scale problem of nlp using ipopt Optimization (Mathematical) jump , ipopt	10	232	June 26, 2024
Comparing non-linear least squares solvers Optimization (Mathematical) nonlinear , nonlinear-optimizati	37	4178	March 5, 2024
Nonlinear Optimization with Many Constraints + Autodifferentiation: Which Julia Solution? Optimization (Mathematical) jump , ipopt , nonlinear-optimizati	23	909	March 15, 2024
Ipopt and external function Optimization (Mathematical) jump	11	1150	September 30, 2020
Ipopt in Julia without JumP and automatic hessian calculation? Optimization (Mathematical)	5	1582	January 23, 2019

Nonlinear Least Squares Millions of Allocations

Am I Really Using JuMP?

Using NonLinear Modeling Interface

Questions

Related topics