Optimisation over grid of parameters with `Optimization.jl`

NoFishLikeIan · August 13, 2025, 8:42am

I am trying to solve a low dimensional (box-)constrained optimisation problem over a large grid of parameters. Both the optimisation problem and the constraints depend on the parameters. I can assume that the optimisation problem is relatively well behaved.

Something like \min_{x \in \mathcal{B}(p)} f(x; p) where \mathcal{B}(p) = [a_1, b_1] \times [a_2, b_2(p)] and b_2 is a continuous function of the parameters p \in \mathbb{R}^2.

Furthermore, I know that f(x; p) is smooth in the parameters so that the minimiser x^{\star}(p) = \arg\max_{x \in \mathbb{B}(p)} f(x; p) is changing smoothly in the parameters.

Below is a MWE with an implementation that closely resembles my current one.

using StaticArrays
using Base.Threads
using Optimization, OptimizationOptimJL

mutable struct Vec{T} <: FieldVector{2, T}
    x::T
    y::T
end

StaticArrays.similar_type(::Type{<:Vec}, ::Type{T}, s::Size{(2,)}) where T = Vec{T}
Base.similar(::Type{<:Vec}, ::Type{T}) where T = Vec(zero(T), zero(T))

function rosenbrock(v, p)
    (p[1] - v.x)^2 + p[2] * (v.y - v.x^2)^2
end

function optimovergrid!(maximiser::Matrix{V}, pgrid) where {T, V <: Vec{T}}
    fn = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
    lb = Vec{T}(-1, -1)

    @inbounds @threads for idx in CartesianIndices(pgrid)
        pᵢ = pgrid[idx]
        ub = Vec{T}(Inf, 0.5 + pᵢ[1] * pᵢ[2])
        v₀ = maximiser[idx]
        prob = OptimizationProblem(fn, v₀, pᵢ; lb = lb, ub = ub)
        sol = solve(prob, GradientDescent())

        maximiser[idx] .= sol.u
    end

    return maximiser
end

n = 101
pgrid = [ (p₁, p₂) for p₁ in range(0, 1; length = n), p₂ in range(0, 1; length = n) ]
maximiser = [ similar(Vec{Float64}) for idx in CartesianIndices(pgrid) ]

optimovergrid!(maximiser, pgrid)
@benchmark optimovergrid!($maximiser, $pgrid)

Unfortunately, this allocates a lot and is a bit slow. I am not using the “smoothness” of f in the parameters and I am not reusing the information in the prob. Does anybody see a better way of doing this or some way I could optimise the code?

ChrisRackauckas · August 14, 2025, 5:10am

Well a few things. This is, like another thread going right now, calling for SimpleOptimization.jl. It’s not registered yet but there lis an LBFGS in there:

github.com/SciML/SimpleOptimization.jl

src/solve.jl

main


# By Pass the highlevel checks for OptimizationProblem for Simple Algorithms
function SciMLBase.solve(prob::SciMLBase.OptimizationProblem,
        opt::SimpleLBFGS,
        args...;
        abstol = nothing,
        reltol = nothing,
        termination_condition = nothing,
        maxiters = 100,
        kwargs...)
    f = Base.Fix2(prob.f.f, prob.p)
    ∇f = instantiate_gradient(f, prob.f.adtype)

    nlprob = NonlinearProblem{false}(∇f, prob.u0)
    nlsol = solve(nlprob,
        SimpleLimitedMemoryBroyden(;
            threshold = __get_threshold(opt),
            linesearch = Val(false));
        maxiters,
        abstol,

This file has been truncated. show original

and if you use that, then you can make your u0 be a static array with a quasi-newton method would will converge much faster than GradientDescent. But secondly, you could then use this with KernelAbstractions to then GPU-parallelize over different parameters. It would look just like the example of doing this with (Simple)NonlinearSolve.jl:

I know this at least has worked because it’s one half of this work:

https://openreview.net/pdf?id=nD10o1ge97

I.e. ParallelParticleSwarms.jl has a hybrid algorithm that first does an asynchronous particle swarm for many steps, and then finalizes by doing an LBFGS from every particle to finish the global optimization as a multi-start type of method. Because of this, we know that you can GPU-parallelize the SimpleOptimization.jl LBFGS kernel because that is exactly how the last step is done, though it’s currently not documented so use at your own risk etc. etc., I plan to get that stuff documented and released later this fall, but since it already has benchmarks and paper examples that means it’s already at least usable if you’re willing to give it a shot.

NoFishLikeIan · August 14, 2025, 2:53pm

Thank you! SimpleOptimization.jl seems exactly what I am looking for.

I am giving it a go now on the MWE. It seems really fast, but I often hit the maximum number of iterations.

On the speed: how is it possible? Does it just avoid linesearches?

On the convergence issue: to give some context, I am defining a mutable MutVec for the Optim implementation, which requires mutation and a regular Vec for the SimpleOptimization

using BenchmarkTools
using StaticArrays
using Optimization, OptimizationOptimJL
using SimpleOptimization

# Define vector types
struct Vec{T} <: FieldVector{2, T}
    x::T
    y::T
end

mutable struct MutVec{T} <: FieldVector{2, T}
    x::T
    y::T
end

When I test the two solvers I get the following…

fn = OptimizationFunction(rosenbrock, Optimization.AutoForwardDiff())
p = (0.1, 100.)
v₀ = Vec(0., 0.)
prob = OptimizationProblem(fn, v₀, p; lb = Vec(-1., -1.), ub = Vec(1., 0.5 * p[1]))
maxiters = 10_000

@btime solve($prob, $(SimpleLBFGS()); maxiters = $maxiters); # 1.952 μs (87 allocations: 2.97 KiB)
sol = solve(prob, SimpleLBFGS(); maxiters = maxiters)

mv₀ = MutVec(0., 0.)
mutprob = OptimizationProblem(fn, mv₀, p; lb = MutVec(-1., -1.), ub = MutVec(1., 0.5 * p[1]))

@btime solve($mutprob, $(LBFGS())) # 74.934 μs (1353 allocations: 56.08 KiB)
mutsol = solve(mutprob, LBFGS())

@assert maximum(abs2, mutsol.u .- sol.u) < 1e-5

Clearly the implementation is very efficient and it is cool one can use immutable static arrays. When I test it on the grid I often hit a few MaxIters, no matter how much I increase it. What is odd, is that there seems to be no pattern in the parameters for which this occurs!

n = 100
p₁space = range(0, 1; length = n)
p₂space = range(50, 150; length = n + 1)
pgrid = Iterators.product(p₁space, p₂space)

failedoptimisation = fill(0, size(pgrid))
for (i, p) in enumerate(pgrid)
    prob = OptimizationProblem(fn, v₀, p; lb = Vec(-1., -1.), ub = Vec(1., 0.5 * p[1]))

    sol = solve(prob, SimpleLBFGS(); maxiters = maxiters)

    if !SciMLBase.successful_retcode(sol)
        failedoptimisation[i] = 1
        @warn "Error at $p: $(sol.retcode)"
    end
end

sum(failedoptimisation) / length(failedoptimisation) # 0.1

You can find the full MWE and the Julia status at this gist. If you need help working on this, let me know. I am happy to look into this issue.

P.s. In the package you

import DiffEqBase: AbsSafeBestTerminationMode

This gives me the error

solve(prob, SimpleBFGS(); maxiters = maxiters) # ERROR: UndefVarError: `AbsSafeBestTerminationMode` not defined in `SimpleOptimization`

Has AbsSafeBestTerminationMode been moved by any chance?

marcobonici · August 14, 2025, 4:12pm

Regarding ParallelParticleSwarm, would the GPU backend work with Turing? We are publishing a paper with a student of mine (should hit the arxiv either tomorrow or Monday) where we do tons of optimizations in a scenario like the one considered here (i.e. running bestfits changing the value of a parameter over a grid). Using CPU was fine (we use distributed computing) but we have access to some GPU resources, so would be funny to play with this for the future.

ChrisRackauckas · August 14, 2025, 9:36pm

Might be a bug. Still not quite ready for first release.

NonlinearSolveBase.jl

ChrisRackauckas · August 14, 2025, 9:36pm

What do you mean, like using Turing in a KernelAbstractions kernel?

marcobonici · August 18, 2025, 8:24pm

Sorry, let me briefly describe my use case.

I have a Turing model, which involves some theoretical computations at each step of the MCMC that are GPU-compatible.
I have to run a lot of optimizations, either because I have a lot of data to analyze (see for instance this were we analyzed 300 thousands datasets) or a preprint appearing this night where we computed a lot of minimizations.
Again, we are happy with the CPU based approach, but I find always interesting if we can improve what we are doing.

ChrisRackauckas · August 18, 2025, 10:28pm

In theory if Turing.jl has GPU-kernel-able steppers you could put the whole thing into a KA kernel. I don’t know if that has been tried but if anyone knows it would probably be @devmotion

Topic		Replies	Views
JuliaOpt parallel algorithms Optimization (Mathematical) first-steps	7	3964	August 1, 2017
Speeding up repeated optimizations within a large loop Performance economics , optim , optimization	7	892	June 25, 2020
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3202	August 22, 2017
Optim package in parallel Optimization (Mathematical) question	0	589	May 13, 2019
Performance problems with parallel ensemble simulation on GPU Modelling & Simulations question , gpu , performance	3	1403	February 14, 2020

Optimisation over grid of parameters with `Optimization.jl`

Related topics