I have an optimization problem in which some of the functions are approximated by NNs, thus I am using
Flux to represent them. The package
FluxOptTools has allowed me to use
Optim to optimize my “loss” function
L(), which depends on these NNs,
NNu. I created a MWE below, which is a reduced and toy version of my real implementation.
L() is an Augmented Lagrangian and the full code has the proper loop to update multipliers and penalty terms, but for shortness I removed them from this MWE.
My question is: Is there any alternative to
FluxOptTools to do what I am doing, so I have access to other algorithms
Optim.jl doesn’t provide?
My domain is time, which is being integrated out in
f() and in the constraints
x₀, which is being iterated over in
J(), thus I failed when using
train! function. I have also looked into
Surrogates.jl but couldn’t figure an alternative way. I want to try optimization algorithms
Optim.jl does not provide, like
I have spent a couple of weeks trying to figure this out and failed, thus I am asking for your help.
Here is the MWE, which works, so you can have a better idea of what I am doing…
Feel free to criticize anything you might find odd. Thank you!
using Optim, LinearAlgebra, FastGaussQuadrature, QuadGK, Zygote, Flux, FluxOptTools const T = 5 const X₀ = 5 const N = 2 const M = 4 const r = 4 icrange = collect(0:0.5:X₀) x₂⁰ = 1.0 # initial velocity xT¹, xT² = 3.0, 0.0 # final positions and velocity xT = [xT¹, xT²] NNumodel = Chain( Dense(2, 5, elu), Dense(5, 1) ) |> f64 NNu(t, x₀) = NNumodel(vcat(t, x₀)) NNxmodel = Chain( Dense(2, 5, elu), Dense(5, 2) ) |> f64 NNx(t, x₀) = NNxmodel(vcat(t, x₀)) function ẋ(t, x₀) Δt = 10e-6 return [ (NNx((t+Δt), x₀)[i] - NNx((t-Δt), x₀)[i])/(2*Δt) for i in 1:N ] end #Objective functional J(t) = r*sum(NNu(t, x₀)^2 for x₀ in icrange) #Quadrature transform function t(x) = 1/2*T*x + 1/2*T W(w) = T/2*w #Integration nodes const pts = 60 x, w = gausslobatto(pts) xₜ = t.(x) wₜ = W.(w) #--- Objective functional integral ---# f() = dot(wₜ, J.(xₜ)) #--- True dynamics ---# g(t, x₀) = [0 1; 0 0]*NNx(t, x₀) + [0, 1]*NNu(t, x₀) #---Equality constraints---# h₁() = sum(dot(wₜ, 0.5*norm.(ẋ.(xₜ, Ref(x₀)) - g.(xₜ, Ref(x₀))).^2) for x₀ in icrange) h₂() = sum(1/2*norm(NNx(T, x₀) - xT)^2 for x₀ in icrange) h₃() = sum(1/2*norm(NNx(0, x₀) - [x₀, x₂⁰])^2 for x₀ in icrange) function L() # "Loss" -- Augmented Lagrangean return f() + υ₁*h₁() + υ₂*h₂() + υ₃*h₃() + μ₁*(h₁())^2 + μ₂*(h₂())^2 + μ₃*(h₃())^2 end μ₁ = μ₂ = μ₃ = 10.0 υ₁ = υ₂ = υ₃ = 50*rand() Zygote.refresh() θ = Flux.params(NNumodel, NNxmodel) Lfun, gradfun, fg!, p0 = optfuns(L, θ) opt = optimize(Optim.only_fg!(fg!), p0, LBFGS(), Optim.Options(iterations=5_000, g_tol=10e-3, store_trace=true, show_every=5, show_trace=true))