Parameter optimization of MethodOfLines PDE

I’d like to optimize parameters in the MOL example shown here with an RMSD cost function to minimize.

Following a WIP example here to build the OptimizationFunction, I have the following, but the results come back predicting the max of the bounded range.

using ModelingToolkit, MethodOfLines, OrdinaryDiffEq, DomainSets
using Plots
using DifferentialEquations, Optimization, OptimizationPolyalgorithms, Zygote, OptimizationOptimJL
using SciMLSensitivity
using Statistics

@parameters t x
p = @parameters Dn, Dp
@variables u(..) v(..)
Dt = Differential(t)
Dx = Differential(x)
Dxx = Differential(x)^2

eqs  = [Dt(u(t,x)) ~ Dn * Dxx(u(t,x)) + u(t,x)*v(t,x),
        Dt(v(t,x)) ~ Dp * Dxx(v(t,x)) - u(t,x)*v(t,x)]
bcs = [u(0,x) ~ sin(pi*x/2),
       v(0,x) ~ sin(pi*x/2),
       u(t,0) ~ 0.0, Dx(u(t,1)) ~ 0.0,
       v(t,0) ~ 0.0, Dx(v(t,1)) ~ 0.0]

domains = [t ∈ Interval(0.0,1.0),
           x ∈ Interval(0.0,1.0)]

@named pdesys = PDESystem(eqs,bcs,domains,[t,x],[u(t,x),v(t,x)],[Dn=>0.5, Dp=>2.0])

discretization = MOLFiniteDifference([x=>0.1],t)

pde_prob = discretize(pdesys,discretization)

sol = solve(pde_prob,Tsit5(),saveat=1.0)
meas = sol.u[2] .+ 0.01*randn(18)

idcs = Int.(ModelingToolkit.varmap_to_vars([p[1]=>1, p[2]=>2], p))

function loss(θ)
    new_pde_prob = remake(pde_prob,p=θ[idcs])
    sol = solve(new_pde_prob, Tsit5())
    return sqrt(mean( (sol.u[2] .- meas).^2 )), Array(sol)
end

ad_type = Optimization.AutoZygote()
opt_func = OptimizationFunction((x,p)->loss(x), ad_type)

guess = [1.0,1.0]

opt_prob = OptimizationProblem(opt_func, guess, lb=[0.1,0.1],ub=[4.0,4.0])

res = Optimization.solve(opt_prob, GradientDescent())

Here’s the cost landscape in the bounded region:


And here’s the predicted parameters:

u: 2-element Vector{Float64}:
 3.999999997189557
 3.9999999981732874

Did you try decreasing the tolerance on the solve to improve the gradient accuracy? Did you try optimizers other than GradientDescent?

I tried of SAMIN, ParticleSwarm, NelderMead, GradientDescent, ConjugateGradient, SimmulatedAnnealing, and BFGS and set solve kwargs to x_tol=1e-48,f_tol=1e-48,g_tol=1e-48. Also tried different inital guess values. Might also be worth mentioning this warning when trying to use abstol for each of these solvers.

 Warning: common abstol is currently not used by Fminbox{BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, Float64, Optim.var"#49#51"}(BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}(LineSearches.InitialStatic{Float64} 

The results often come back at the max of the bounds or simply wrong answers.

Might be something related to this issue?

Seems related. Did you try NLopt algorithms? It seems you only tried Optim.jl methods.

Cycled through a number of the NLopt options as well and no successes using res = Optimization.solve(opt_prob, NLopt.ALG_NAME()).

Most go to ~[4.0, 4.0] or max of ub, a couple settle on solutions greater than ub, and a couple settle on other incorrect solutions in the bounds. A few like LN_AUGLAG(), LD_AUGLAG(), and LD_AUGLAG_EQ() actually kill the Julia sesson and leave:

The terminal process "C:\Users\thomp\AppData\Local\Programs\Julia-1.7.1\bin\julia.exe '-i', '--banner=no', '--project=C:\Users\thomp\.julia\environments\v1.7', 'c:\Users\thomp\.vscode\extensions\julialang.language-julia-1.6.28\scripts\terminalserver\terminalserver.jl', '\\.\pipe\vsc-jl-repl-794293b1-558a-425f-a45a-e15fdcdb0fe9', '\\.\pipe\vsc-jl-cr-5b783b6a-1591-42ad-b686-ad595474f7b0', 'USE_REVISE=true', 'USE_PLOTPANE=true', 'USE_PROGRESS=true', 'ENABLE_SHELL_INTEGRATION=false', 'DEBUG_MODE=false'" terminated with exit code: 1.

Several gradient and global solvers took too long and I pulled their plug after several minutes.

No, not those tolerances. The abstol and reltol of the ODE solve.

1 Like

I asked on github as well, but… Are you sure it’s not optimal to go to the boundaries then, if all optimizers do?

This got the expected values for the optimization!

# ...

sol = solve(pde_prob,Tsit5(),saveat=1.0,abstol=1e-64)

# ...

function loss(θ)
    new_pde_prob = remake(pde_prob,p=θ[idcs])
    sol = solve(new_pde_prob, Tsit5(), saveat=1.0, abstol=1e-64)
    return sqrt(mean( (sol.u[2] .- meas).^2 )), Array(sol)
end

With NLopt and Optim methods (including ParticleSwarm) agreeing on parameters for cost against noise-added:

u: 2-element Vector{Float64}:
 0.48967039327895784
 2.7541826258158166

Higher values for abstol work too, but this was the fix I needed.

I refer those struggling similarly here.

Awesome.

Yeah I think the thing most people don’t realize is that AD is always considered “exact”, but AD of an equation solver produces an equation to solve, and so your gradient is only as exact as the solve of that extra equation. By default (there is a way to modify this) the sensitivity equations solve at the same tolerance of the forward solve, so you get gradients which “are as accurate” as the forward solve (subject to usual issues when talking about heuristic accuracy). This means that if you solve forward with a relative tolerance of 1e-3, you have a relative tolerance on the global solution of around 1e-1 (since tolerance in equation solvers is local not global, as described in that FAQ. This is a general thing for all solvers in all languages BTW so a general thing beyond Julia as well), and thus you also have a global gradient accuracy of around 1e-1.

This will definitely make gradient-based methods perform weird, which means you need a lower tolerance when doing optimization, if not for the solution itself, at least for the gradient accuracy. If using a non-gradient based approach like particle swarm, it can still be helpful because inaccuracies in the solve present itself as akin to stochasticity in the loss function, and many optimizers can have issue with stochasticity. That’s likely what caused particle swarm to have issues finding a good optimum.

1 Like

Regarding the docs,

Of course, there’s always a tradeoff between accuracy and efficiency, so play around to find out what’s right for your problem.

For larger and higher dimensinal problems, would a good appraoch be to optimize with default tolerances, then iteratively solve with tighter tolerances to see if optimization solutions converge? I feel like this could be ‘baked in’ somehow…

Not necessarily, because a bad gradient can throw the optimization off and out. So you may want to do such an iterative approach, but you want to make sure that you start with a low enough tolerance still.