Parameter optimization of MethodOfLines PDE

thompsonmj · July 22, 2022, 5:53pm

I’d like to optimize parameters in the MOL example shown here with an RMSD cost function to minimize.

Following a WIP example here to build the OptimizationFunction, I have the following, but the results come back predicting the max of the bounded range.

using ModelingToolkit, MethodOfLines, OrdinaryDiffEq, DomainSets
using Plots
using DifferentialEquations, Optimization, OptimizationPolyalgorithms, Zygote, OptimizationOptimJL
using SciMLSensitivity
using Statistics

@parameters t x
p = @parameters Dn, Dp
@variables u(..) v(..)
Dt = Differential(t)
Dx = Differential(x)
Dxx = Differential(x)^2

eqs  = [Dt(u(t,x)) ~ Dn * Dxx(u(t,x)) + u(t,x)*v(t,x),
        Dt(v(t,x)) ~ Dp * Dxx(v(t,x)) - u(t,x)*v(t,x)]
bcs = [u(0,x) ~ sin(pi*x/2),
       v(0,x) ~ sin(pi*x/2),
       u(t,0) ~ 0.0, Dx(u(t,1)) ~ 0.0,
       v(t,0) ~ 0.0, Dx(v(t,1)) ~ 0.0]

domains = [t ∈ Interval(0.0,1.0),
           x ∈ Interval(0.0,1.0)]

@named pdesys = PDESystem(eqs,bcs,domains,[t,x],[u(t,x),v(t,x)],[Dn=>0.5, Dp=>2.0])

discretization = MOLFiniteDifference([x=>0.1],t)

pde_prob = discretize(pdesys,discretization)

sol = solve(pde_prob,Tsit5(),saveat=1.0)
meas = sol.u[2] .+ 0.01*randn(18)

idcs = Int.(ModelingToolkit.varmap_to_vars([p[1]=>1, p[2]=>2], p))

function loss(θ)
    new_pde_prob = remake(pde_prob,p=θ[idcs])
    sol = solve(new_pde_prob, Tsit5())
    return sqrt(mean( (sol.u[2] .- meas).^2 )), Array(sol)
end

ad_type = Optimization.AutoZygote()
opt_func = OptimizationFunction((x,p)->loss(x), ad_type)

guess = [1.0,1.0]

opt_prob = OptimizationProblem(opt_func, guess, lb=[0.1,0.1],ub=[4.0,4.0])

res = Optimization.solve(opt_prob, GradientDescent())

Here’s the cost landscape in the bounded region:

And here’s the predicted parameters:

u: 2-element Vector{Float64}:
 3.999999997189557
 3.9999999981732874

ChrisRackauckas · July 26, 2022, 2:29am

Did you try decreasing the tolerance on the solve to improve the gradient accuracy? Did you try optimizers other than GradientDescent?

thompsonmj · July 26, 2022, 3:06pm

I tried of SAMIN, ParticleSwarm, NelderMead, GradientDescent, ConjugateGradient, SimmulatedAnnealing, and BFGS and set solve kwargs to x_tol=1e-48,f_tol=1e-48,g_tol=1e-48. Also tried different inital guess values. Might also be worth mentioning this warning when trying to use abstol for each of these solvers.

 Warning: common abstol is currently not used by Fminbox{BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, Float64, Optim.var"#49#51"}(BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}(LineSearches.InitialStatic{Float64}

The results often come back at the max of the bounds or simply wrong answers.

Might be something related to this issue?

ChrisRackauckas · July 26, 2022, 3:51pm

Seems related. Did you try NLopt algorithms? It seems you only tried Optim.jl methods.

thompsonmj · July 27, 2022, 1:37am

Cycled through a number of the NLopt options as well and no successes using res = Optimization.solve(opt_prob, NLopt.ALG_NAME()).

Most go to ~[4.0, 4.0] or max of ub, a couple settle on solutions greater than ub, and a couple settle on other incorrect solutions in the bounds. A few like LN_AUGLAG(), LD_AUGLAG(), and LD_AUGLAG_EQ() actually kill the Julia sesson and leave:

The terminal process "C:\Users\thomp\AppData\Local\Programs\Julia-1.7.1\bin\julia.exe '-i', '--banner=no', '--project=C:\Users\thomp\.julia\environments\v1.7', 'c:\Users\thomp\.vscode\extensions\julialang.language-julia-1.6.28\scripts\terminalserver\terminalserver.jl', '\\.\pipe\vsc-jl-repl-794293b1-558a-425f-a45a-e15fdcdb0fe9', '\\.\pipe\vsc-jl-cr-5b783b6a-1591-42ad-b686-ad595474f7b0', 'USE_REVISE=true', 'USE_PLOTPANE=true', 'USE_PROGRESS=true', 'ENABLE_SHELL_INTEGRATION=false', 'DEBUG_MODE=false'" terminated with exit code: 1.

Several gradient and global solvers took too long and I pulled their plug after several minutes.

ChrisRackauckas · July 27, 2022, 3:19am

No, not those tolerances. The abstol and reltol of the ODE solve.

pkofod · July 27, 2022, 7:29am

I asked on github as well, but… Are you sure it’s not optimal to go to the boundaries then, if all optimizers do?

thompsonmj · July 27, 2022, 1:50pm

This got the expected values for the optimization!

# ...

sol = solve(pde_prob,Tsit5(),saveat=1.0,abstol=1e-64)

# ...

function loss(θ)
    new_pde_prob = remake(pde_prob,p=θ[idcs])
    sol = solve(new_pde_prob, Tsit5(), saveat=1.0, abstol=1e-64)
    return sqrt(mean( (sol.u[2] .- meas).^2 )), Array(sol)
end

With NLopt and Optim methods (including ParticleSwarm) agreeing on parameters for cost against noise-added:

u: 2-element Vector{Float64}:
 0.48967039327895784
 2.7541826258158166

Higher values for abstol work too, but this was the fix I needed.

I refer those struggling similarly here.

ChrisRackauckas · July 27, 2022, 2:09pm

Awesome.

Yeah I think the thing most people don’t realize is that AD is always considered “exact”, but AD of an equation solver produces an equation to solve, and so your gradient is only as exact as the solve of that extra equation. By default (there is a way to modify this) the sensitivity equations solve at the same tolerance of the forward solve, so you get gradients which “are as accurate” as the forward solve (subject to usual issues when talking about heuristic accuracy). This means that if you solve forward with a relative tolerance of 1e-3, you have a relative tolerance on the global solution of around 1e-1 (since tolerance in equation solvers is local not global, as described in that FAQ. This is a general thing for all solvers in all languages BTW so a general thing beyond Julia as well), and thus you also have a global gradient accuracy of around 1e-1.

This will definitely make gradient-based methods perform weird, which means you need a lower tolerance when doing optimization, if not for the solution itself, at least for the gradient accuracy. If using a non-gradient based approach like particle swarm, it can still be helpful because inaccuracies in the solve present itself as akin to stochasticity in the loss function, and many optimizers can have issue with stochasticity. That’s likely what caused particle swarm to have issues finding a good optimum.

thompsonmj · July 27, 2022, 2:15pm

Regarding the docs,

Of course, there’s always a tradeoff between accuracy and efficiency, so play around to find out what’s right for your problem.

For larger and higher dimensinal problems, would a good appraoch be to optimize with default tolerances, then iteratively solve with tighter tolerances to see if optimization solutions converge? I feel like this could be ‘baked in’ somehow…

ChrisRackauckas · July 27, 2022, 2:35pm

Not necessarily, because a bad gradient can throw the optimization off and out. So you may want to do such an iterative approach, but you want to make sure that you start with a low enough tolerance still.

Topic		Replies	Views
PEtab with MethodOfLines Optimization (Mathematical) question	4	160	April 8, 2024
PDE Parameter Estimation Optimization (Mathematical) question , diffeq , optimization , modelingtoolkit	5	185	November 12, 2024
Derivative of user-defined function with variables as arguments in a PDE system General Usage question , package , diffeq , modelingtoolkit , methodoflines	1	273	March 21, 2024
PDE with neural network and derivative of NN using MethodOfLines Modelling & Simulations modelingtoolkit , methodoflines	2	185	June 15, 2024
Issue with Differential Equation Solver General Usage question , package , diffeq	35	984	January 13, 2024

Parameter optimization of MethodOfLines PDE

Related topics