Optimisation over grid of parameters with `Optimization.jl`

I am trying to solve a low dimensional (box-)constrained optimisation problem over a large grid of parameters. Both the optimisation problem and the constraints depend on the parameters. I can assume that the optimisation problem is relatively well behaved.

Something like \min_{x \in \mathcal{B}(p)} f(x; p) where \mathcal{B}(p) = [a_1, b_1] \times [a_2, b_2(p)] and b_2 is a continuous function of the parameters p \in \mathbb{R}^2.

Furthermore, I know that f(x; p) is smooth in the parameters so that the minimiser x^{\star}(p) = \arg\max_{x \in \mathbb{B}(p)} f(x; p) is changing smoothly in the parameters.

Below is a MWE with an implementation that closely resembles my current one.

using StaticArrays
using Base.Threads
using Optimization, OptimizationOptimJL

mutable struct Vec{T} <: FieldVector{2, T}
    x::T
    y::T
end

StaticArrays.similar_type(::Type{<:Vec}, ::Type{T}, s::Size{(2,)}) where T = Vec{T}
Base.similar(::Type{<:Vec}, ::Type{T}) where T = Vec(zero(T), zero(T))

function rosenbrock(v, p)
    (p[1] - v.x)^2 + p[2] * (v.y - v.x^2)^2
end

function optimovergrid!(maximiser::Matrix{V}, pgrid) where {T, V <: Vec{T}}
    fn = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
    lb = Vec{T}(-1, -1)

    @inbounds @threads for idx in CartesianIndices(pgrid)
        pᾢ = pgrid[idx]
        ub = Vec{T}(Inf, 0.5 + pᾢ[1] * pᾢ[2])
        v₀ = maximiser[idx]
        prob = OptimizationProblem(fn, v₀, pᵢ; lb = lb, ub = ub)
        sol = solve(prob, GradientDescent())

        maximiser[idx] .= sol.u
    end

    return maximiser
end

n = 101
pgrid = [ (p₁, p₂) for p₁ in range(0, 1; length = n), p₂ in range(0, 1; length = n) ]
maximiser = [ similar(Vec{Float64}) for idx in CartesianIndices(pgrid) ]

optimovergrid!(maximiser, pgrid)
@benchmark optimovergrid!($maximiser, $pgrid)

Unfortunately, this allocates a lot and is a bit slow. I am not using the “smoothness” of f in the parameters and I am not reusing the information in the prob. Does anybody see a better way of doing this or some way I could optimise the code?

Well a few things. This is, like another thread going right now, calling for SimpleOptimization.jl. It’s not registered yet but there lis an LBFGS in there:

and if you use that, then you can make your u0 be a static array with a quasi-newton method would will converge much faster than GradientDescent. But secondly, you could then use this with KernelAbstractions to then GPU-parallelize over different parameters. It would look just like the example of doing this with (Simple)NonlinearSolve.jl:

I know this at least has worked because it’s one half of this work:

https://openreview.net/pdf?id=nD10o1ge97

I.e. ParallelParticleSwarms.jl has a hybrid algorithm that first does an asynchronous particle swarm for many steps, and then finalizes by doing an LBFGS from every particle to finish the global optimization as a multi-start type of method. Because of this, we know that you can GPU-parallelize the SimpleOptimization.jl LBFGS kernel because that is exactly how the last step is done, though it’s currently not documented so use at your own risk etc. etc., I plan to get that stuff documented and released later this fall, but since it already has benchmarks and paper examples that means it’s already at least usable if you’re willing to give it a shot.

3 Likes

Thank you! SimpleOptimization.jl seems exactly what I am looking for.

I am giving it a go now on the MWE. It seems really fast, but I often hit the maximum number of iterations.

On the speed: how is it possible? Does it just avoid linesearches?

On the convergence issue: to give some context, I am defining a mutable MutVec for the Optim implementation, which requires mutation and a regular Vec for the SimpleOptimization

using BenchmarkTools
using StaticArrays
using Optimization, OptimizationOptimJL
using SimpleOptimization

# Define vector types
struct Vec{T} <: FieldVector{2, T}
    x::T
    y::T
end

mutable struct MutVec{T} <: FieldVector{2, T}
    x::T
    y::T
end

When I test the two solvers I get the following…

fn = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
p = (0.1, 100.)
v₀ = Vec(0., 0.)
prob = OptimizationProblem(fn, v₀, p; lb = Vec(-1., -1.), ub = Vec(1., 0.5 * p[1]))
maxiters = 10_000

@btime solve($prob, $(SimpleLBFGS()); maxiters = $maxiters); # 1.952 Îźs (87 allocations: 2.97 KiB)
sol = solve(prob, SimpleLBFGS(); maxiters = maxiters)

mv₀ = MutVec(0., 0.)
mutprob = OptimizationProblem(fn, mv₀, p; lb = MutVec(-1., -1.), ub = MutVec(1., 0.5 * p[1]))

@btime solve($mutprob, $(LBFGS())) # 74.934 Îźs (1353 allocations: 56.08 KiB)
mutsol = solve(mutprob, LBFGS())

@assert maximum(abs2, mutsol.u .- sol.u) < 1e-5

Clearly the implementation is very efficient and it is cool one can use immutable static arrays. When I test it on the grid I often hit a few MaxIters, no matter how much I increase it. What is odd, is that there seems to be no pattern in the parameters for which this occurs!

n = 100
p₁space = range(0, 1; length = n)
p₂space = range(50, 150; length = n + 1)
pgrid = Iterators.product(p₁space, p₂space)

failedoptimisation = fill(0, size(pgrid))
for (i, p) in enumerate(pgrid)
    prob = OptimizationProblem(fn, v₀, p; lb = Vec(-1., -1.), ub = Vec(1., 0.5 * p[1]))

    sol = solve(prob, SimpleLBFGS(); maxiters = maxiters)

    if !SciMLBase.successful_retcode(sol)
        failedoptimisation[i] = 1
        @warn "Error at $p: $(sol.retcode)"
    end
end

sum(failedoptimisation) / length(failedoptimisation) # 0.1

You can find the full MWE and the Julia status at this gist. If you need help working on this, let me know. I am happy to look into this issue.

P.s. In the package you

import DiffEqBase: AbsSafeBestTerminationMode

This gives me the error

solve(prob, SimpleBFGS(); maxiters = maxiters) # ERROR: UndefVarError: `AbsSafeBestTerminationMode` not defined in `SimpleOptimization`

Has AbsSafeBestTerminationMode been moved by any chance?

Regarding ParallelParticleSwarm, would the GPU backend work with Turing? We are publishing a paper with a student of mine (should hit the arxiv either tomorrow or Monday) where we do tons of optimizations in a scenario like the one considered here (i.e. running bestfits changing the value of a parameter over a grid). Using CPU was fine (we use distributed computing) but we have access to some GPU resources, so would be funny to play with this for the future.

Might be a bug. Still not quite ready for first release.

NonlinearSolveBase.jl

What do you mean, like using Turing in a KernelAbstractions kernel?