Hopefully, I am addressing your comments @odow.
Am I Really Using JuMP?
I ran the exact script provided in a fresh REPL and the log in the original post shows the results. When I exclude JuMP from my using statements, it shows the error that @variable not defined
. So I do believe that I am using the JuMP interface. Here is my package environment, and I do believe I am using the latest stable release of all packages:
(WENDy) pkg> st
Project NLPMLE v1.0.0-DEV
Status `~/.julia/dev/WENDy/Project.toml`
[fbb218c0] BSON v0.3.9
[0a1fb500] BlockDiagonals v0.1.42
[7a1cc6ca] FFTW v1.8.0
[6a3955dd] ImageFiltering v0.7.8
[a98d9a8b] Interpolations v0.15.1
[b964fa9f] LaTeXStrings v1.3.1
[23992714] MAT v0.10.7
[33e6dc65] MKL v0.6.3
[961ee093] ModelingToolkit v9.17.1
[1dea7af3] OrdinaryDiffEq v6.82.0
[f0f68f2c] PlotlyJS v0.18.13
[91a5bcdd] Plots v1.40.4
[295af30f] Revise v3.5.14
[90137ffa] StaticArrays v1.9.5
[0c5d862f] Symbolics v5.30.3
[b8865327] UnicodePlots v3.6.4
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[9a3f8284] Random
[10745b16] Statistics v1.10.0
Using NonLinear Modeling Interface
I did read through the Nonlinear Operators documentation several times, but it was not clear how to best proceed because depending on the situation will have:
- residual of different length
- variables of different length
Point taken though, I can try to make several nonlinear operators with scalar outputs that have gradients rather than a jacobian.
I saw that [Tips and tricks · JuMP] suggests using a memorized function to help with operators with multiple outputs. So I attempt to use this work around. It is still adding allocations and run time which is frustrating.
Also, I found this post that shows how to build. The post is old from 2020 and suggests building an expression which matches the legacy syntax approach this problem. I used the legacy syntax, and now have faster code, but there is still considerable overhead from the memory function for the gradient computations.
@info "Loading dependencies"
# std lib
using LinearAlgebra, Logging, Random
# external dependencies
using JuMP, Ipopt, Tullio, BenchmarkTools
##
# Set size of problem
@info "Generate random data..."
Random.seed!(1)
K = 100
M = 1000
J = 10
D = 3
diag_reg = 1e-10
# Build Random data datrix
U = rand(D, M)
u = reshape(U, D*M)
# Give random noise
sig = rand(D)
# Build random matrices with orthogonal rows
V = Matrix(qr(rand(M,K)).Q')[1:K,:]
Vp = Matrix(qr(rand(M,K)).Q')[1:K,:];
# Build lhs of ODE
b0 = reshape(Vp * U', K*D)
nothing
## Define necessary functions
@info "Building functions..."
# Define functions from the ODE RHS
function f!(fm, w, um)
fm[1] = w[1] * um[2] + w[2] * um[1]^3 + w[3] * um[1]^2 + w[4] * um[3]
fm[2] = w[5] + w[6] * um[1]^2 + w[7] * um[2]
fm[3] = w[8] *um[1] + w[9] + w[10] * um[3]
end
# jacobian of the RHS with respect to u
function jacuf!(jfm, w, um)
jfm[1] = 2*w[3]*um[1] + 3*w[2]*um[1]^2
jfm[2] = 2*w[6]*um[1]
jfm[3] = w[8]
jfm[4] = w[1]
jfm[5] = w[7]
jfm[6] = 0
jfm[7] = w[4]
jfm[8] = 0
jfm[9] = w[10]
nothing
end
# jacobian of the right hand side with respect to w
function jacwf!(jfm, w, um)
jfm .= 0
jfm[1] = um[2]
jfm[4] = (^)(um[1], 3)
jfm[7] = (^)(um[1], 2)
jfm[10] = um[3]
jfm[14] = 1
jfm[17] = (^)(um[1], 2)
jfm[20] = um[2]
jfm[24] = um[1]
jfm[27] = 1
jfm[30] = um[3]
nothing
end
# Get the cholesky decomposition of our current approximation of the covariance
function RTfun(U::AbstractMatrix, V::AbstractMatrix, Vp::AbstractMatrix, sig::AbstractVector, diag_reg::Real, jacuf!::Function, w::AbstractVector)
# Preallocate for L
D, M = size(U)
K, _ = size(V)
JuF = zeros(D,D,M)
_L0 = zeros(K,D,D,M)
_L1 = zeros(K,D,M,D)
# Get _L1
K,D,M,_ = size(_L1)
@inbounds for m in 1:M
jacuf!(view(JuF,:,:,m), w, view(U,:,m))
end
@tullio _L0[k,d2,d1,m] = V[k,m] * JuF[d2,d1,m] * sig[d1]
@tullio _L0[k,d,d,m] += Vp[k,m]*sig[d]
permutedims!(_L1,_L0,(1,2,4,3))
L = reshape(_L1,K*D,M*D)
# compute covariance
S = L * L'
# regularize for possible ill conditioning
R = (1-diag_reg)*S + diag_reg*I
# compute cholesky for lin solv efficiency
cholesky!(Symmetric(R))
return UpperTriangular(R)'
end
# Define residual function
function res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, b::AbstractVector, f!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
with_logger(ConsoleLogger(stderr, ll)) do
@info "++++ Res Eval ++++"
K, M = size(V)
D, _ = size(U)
@info " Evaluate F "
dt = @elapsed a = @allocations begin
F = zeros(W, D, M)
for m in 1:size(F,2)
f!(view(F,:,m), w, view(U,:,m))
end
end
@info " $dt s, $a allocations"
@info " Mat Mat mult "
dt = @elapsed a = @allocations G = V * F'
@info " $dt s, $a allocations"
@info " Reshape "
dt = @elapsed a = @allocations g = reshape(G, K*D)
@info " $dt s, $a allocations"
@info " Linear Solve "
dt = @elapsed a = @allocations res = RT \ g
@info " $dt s, $a allocations"
@info " Vec Vec add "
dt = @elapsed a = @allocations res .-= b
@info " $dt s, $a allocations"
@info "++++++++++++++++++"
return res
end
end
# Define Jacobian of residual with respect to the parameters w
function ∇res(RT::AbstractMatrix, U::AbstractMatrix, V::AbstractMatrix, jacwf!::Function, w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W
with_logger(ConsoleLogger(stderr, ll)) do
@info "++++ Jac Res Eval ++++"
K, M = size(V)
D, _ = size(U)
J = length(w)
@info " Evaluating jacobian of F"
dt = @elapsed a = @allocations begin
JwF = zeros(W,D,J,M)
for m in 1:M
jacwf!(view(JwF,:,:,m), w, view(U,:,m))
end
end
@info " $dt s, $a allocations"
@info " Computing V ×_3 jacF with tullio"
dt = @elapsed a = @allocations @tullio _JG[d,j,k] := V[k,m] * JwF[d,j,m]
@info " $dt s, $a allocations"
@info " permutedims"
dt = @elapsed a = @allocations JG = permutedims(_JG,(3,1,2))
@info " $dt s, $a allocations"
@info " Reshape"
dt = @elapsed a = @allocations jacG = reshape(JG, K*D, J)
@info " $dt s, $a allocations"
@info " Linsolve"
dt = @elapsed a = @allocations ∇res = RT \ jacG
@info " $dt s, $a allocations"
return ∇res
end
end
function memorize_f(f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
_w, _f = NTuple{J,W}(zeros(W,J)), zeros(W,J)
function f_kd(kd::Int, wargs::W...)
if wargs !== _w
_w, _f = wargs, f(wargs...)
end
return _f[kd]::W
end
return [(wargs...) -> f_kd(kd, wargs...) for kd in 1:KD]
end
function memorize_∇f!(∇f::Function, KD::Int, ::Val{J}, ::Val{W}) where W
_w, _∇f = NTuple{J,W}(zeros(W,J)), zeros(W,KD,J)
function ∇f_kd!(ret::AbstractVector{W}, kd::Int, wargs::W...)
if wargs !== _w
_w = wargs
_∇f = ∇f(wargs...)
end
ret .= view(_∇f, kd, :)
nothing
end
return [(∇f_kd, wargs...) -> ∇f_kd!(∇f_kd, kd, wargs...) for kd in 1:KD]
end
#
res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = res(RT,U,V,b,f!,w;ll=ll)
∇res(w::AbstractVector{W}; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(RT,U,V,jacwf!,w;ll=ll)
res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = res(collect(wargs);ll=ll)
∇res(wargs::W...; ll::Logging.LogLevel=Logging.Warn) where W = ∇res(collect(wargs);ll=ll)
res_kd = memorize_f(res, K*D, Val(J), Val(eltype(U)))
∇res_kd = memorize_∇f!(∇res, K*D, Val(J), Val(eltype(U)))
res_mem(wargs::W...) where W = reduce(vcat, res_kd[kd](wargs...) for kd in 1:K*D)
function ∇res_mem!(∇res::AbstractMatrix{W}, wargs::W...) where W
for kd in 1:K*D
∇res_kd[kd](view(∇res,kd,:), wargs...)
end
nothing
end
##
@info "Test Allocations with with Float64"
w_rand = rand(J)
RT = RTfun(U,V,Vp,sig,diag_reg,jacuf!,w_rand)
b = RT \ b0
# because this problem is linear in w the jacobian is constant,
# and we could solve this problem with backslash because
# it is really a linear least squares problem
w_star = ∇res(zeros(J)) \ b
##
res_w_rand = res(w_rand)
res_w_rand_splat = res(w_rand...)
res_mem_w_rand = res_mem(w_rand...)
@assert all(res_w_rand .== res_w_rand_splat)
@assert all(res_w_rand .== res_mem_w_rand)
∇res_w_rand = ∇res(w_rand)
∇res_w_rand_splat = ∇res(w_rand...)
∇res_mem_w_rand = zeros(K*D,J)
∇res_mem!(∇res_mem_w_rand,w_rand...)
@assert all(∇res_w_rand .== ∇res_w_rand_splat)
@assert all(∇res_w_rand .== ∇res_mem_w_rand)
nothing
##
@info " Benchmark res"
@time res(rand(J))
@info " Benchmark res splatting"
@time res(rand(J)...)
@info " Benchmark res memorized"
@time res_mem(rand(J)...)
@info " Benchmark ∇res "
@time ∇res(rand(J))
@info " Benchmark ∇res splatting"
@time ∇res(rand(J)...)
@info " Benchmark ∇res memorized"
∇res_mem_w_rand = zeros(K*D,J)
@time ∇res_mem!(∇res_mem_w_rand,rand(J)...);
nothing
##
@info "Test the NLS in isolation"
@info "Defining model for Ipopt"
mdl = Model(Ipopt.Optimizer)
@variable(mdl, w[j = 1:J], start = w_rand[j])
@variable(mdl, r[k = 1:K*D])
for kd in 1:K*D
fk = Symbol("f$kd")
register(mdl, fk, J, res_kd[kd], ∇res_kd[kd];autodiff=false)
add_nonlinear_constraint(mdl, :($(fk)($(w)...) == $(r[kd])) )
end
@objective(mdl, Min, sum(r .^2 ) )
set_silent(mdl)
@info "Running optimizer"
@time optimize!(mdl)
w_hat = value.(w)
abserr = norm(w_hat - w_star)
relerr = abserr / norm(w_star)
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
obj_star = sum(res(w_star...).^2)
abserr = abs(objective_value(mdl) - obj_star)
relerr = abserr / obj_star
@info "Absolute coeff error = $abserr"
@info "Relative coeff error = $relerr"
With log
[ Info: Loading dependencies
[ Info: Generate random data...
[ Info: Building functions...
[ Info: Test Allocations with with Float64
[ Info: Benchmark res
0.000531 seconds (14 allocations: 29.359 KiB)
[ Info: Benchmark res splatting
0.000415 seconds (37 allocations: 29.922 KiB)
[ Info: Benchmark res memorized
0.000937 seconds (8.14 k allocations: 593.938 KiB)
[ Info: Benchmark ∇res
0.002232 seconds (65 allocations: 308.188 KiB)
[ Info: Benchmark ∇res splatting
0.001899 seconds (88 allocations: 308.750 KiB)
[ Info: Benchmark ∇res memorized
0.003405 seconds (10.59 k allocations: 613.578 KiB)
[ Info: Test the NLS in isolation
[ Info: Defining model for Ipopt
[ Info: Running optimizer
0.314099 seconds (1.02 M allocations: 38.915 MiB, 4.15% gc time)
[ Info: Absolute coeff error = 8.351175528130465e-10
[ Info: Relative coeff error = 1.8051495629515027e-10
[ Info: Absolute coeff error = 0.0
[ Info: Relative coeff error = 0.0
Questions
- When I tried to use the new
@operator
and@constraint
functions, I was getting errors deep in the MathOptInterface. How would you write equivalent code using the updated syntax? - This is faster than it was originally and it has fewer allocations, but is still slower than I would have hoped. Why is this the case? Is there more to be done to optimize the memory functions? It really seems like a bummer to have to use a hack like this to solve problem a “large” residual and variables. In the big scheme of things hundreds of variables is not large, and this will scale poorly if the problem size gets larger.