Derivative-free optimization of bounded nonlinear least squares

claudiop21 · May 27, 2026, 10:32pm

Hello everybody!

I am looking for an optimization package to solve the following minimization problem:

\min_{\bf x \in \Omega} \frac{1}{2}||{\bf f} ({\bf x})||^2 + \lambda ||{\bf x}||^2,

where \Omega is a bounded domain with both restrictions of type:

{\bf x}_l < {\bf x} < {\bf x}_u\\\, A_{\mathrm{eq}} {\bf x} = {\bf b}_{\mathrm{eq}}.

As an additional constraint, the Jacobian of my problem is not available and cannot be computed using automatic differentiation.

I am looking for package in Julia similar to MATLAB’s lsqnonlin, which uses Trust Region Reflective (TRR) algorithm, but I have not found any package in Julia that fits completely with all my requirements. Does anybody know if there is one suited for my purpose?

Also, if there is any optimization algorithm better than TRR for my purpose (and is implemented in a Julia package) I would love to hear about it, so I am also open to your suggestion and diving deeper in optimization.

Finally, I would use MATLAB but I already have all the code in Julia, the function evaluation is much faster in Julia and I am planning in uploading the code to HPC cluster where MATLAB is not available.

adienes · May 27, 2026, 10:57pm

can GitHub - oxfordcontrol/Clarabel.jl: Clarabel.jl: Interior-point solver for convex conic optimisation problems in Julia. · GitHub fit your needs?

WalterMadelim · May 27, 2026, 11:41pm

since you mentioned function evaluation, I wonder what is the function f in the obj.

langestefan · May 28, 2026, 12:40am

I’m not aware of a package that can handle the problem as is. I did find Enlsip.jl which may work?

Can you use finite differencing?

Also, as an alternative strategy. You can eliminate the equality constraints by using a null space reformulation. Basically:

Compute a particular solution x_p = A_{\text{eq}}^{-1} b_{\text{eq}}. Just do x_p = A_eq \ b_eq
Find the nullspace of \text{null}(A_{\text{eq}}) = Z. Just do Z = nullspace(A_eq)
Now you have x = x_p + Z y, \qquad y \in \mathbb{R}^{n-r}.
Substitute. You should end up with

\min_{y \in \mathbb{R}^{n-r}} \tfrac{1}{2}\bigl\|f(x_p + Z y)\bigr\|^2 + \lambda\|x_p + Z y\|^2 \quad \text{s.t.}\ \ x_l - x_p \le Z y \le x_u - x_p

Now you do need to be a bit more careful with differentiating. Because now the variable is y, and \frac{\partial x}{\partial y} = Z.

With this form you can use the machinery in NonlinearSolve.jl, because the equality constraint is eliminated (see Nonlinear Problems · NonlinearSolve.jl).

claudiop21 · May 28, 2026, 10:29am

@adienes Thanks, but my problem is not a quadratic form but a strongly non-linear one, so that package and the approximation does not fit my needs, but thank you anyway!

@WalterMadelim The function f is the normal derivative along the boundary of the solution of the Helmholtz equation, which is computed numerically using the Boundary Integral Equation Method. x is the parametrization of the boundary in a certain part of the boundary. As you can see, the problem is strongly non-linear and the computation of the jacobian either analytically or using automatic differentiation is not an option.

@langestefan Yes, my function admits finite differences (this is how MATLAB lsqnonlin works). I will check the package you propose and the technique you mention. They look interesting and may have potential, thank you very much.

langestefan · May 28, 2026, 10:38am

This is also typically something @stevengj would be able to answer

stevengj · May 28, 2026, 11:46am

You can also simply use derivative-free optimization (DFO) methods like PRIMA.jl or BOBYQA in NLopt.jl (which often internally construct gradient estimates, but don’t incur the cost of a full finite-difference approximation at each step).

What is the dimension n of your parameter space x \in \Omega \subseteq \mathbb{R}^n? The basic problem with both finite differences and other DFO methods is that they incur costs that are \gtrsim n \times {} the cost of f(x), so they are impractical when f is expensive and n is large.

(Depending on what optimization algorithm you are using, you typically need at most the gradient of your objective function, not the full Jacobian of f, though the latter is needed for Levenberg–Marquardt/Gauss–Newton methods.)

AD may not work, but computation analytically is absolutely an option. You just have to learn how to take derivatives at a higher level of abstraction than in first-year calculus — we have a whole course 18.063: Matrix Calculus about this at MIT, with online course notes, that may be helpful to you.

One tricky issue with a BEM setting like this is that, as the shape of your boundary changes, you might need to re-mesh, which means that the discretized solution may change non-differentiably. A good solution is a “differentiate-then-discretize” approach in which you differentiate as if you were solving the exact surface-integral equations (SIE), which gives you an “adjoint SIE” that you solve to obtain the gradient. Then you discretize this adjoint problem similarly to your original SIE, and the gradient is correct up to discretization error, which is typically good enough for optimization.

Obtaining an adjoint problem is also known as “reverse mode” or “backpropagation” differentiation, and has the advantage that computing the gradient costs only a single additional SIE solve regardless of n, so it becomes feasible to do shape optimization over a huge number of parameters — this is the only practical approach in high dimensions. A good reference for this in the SIE/BEM setting is this paper, which obtains shape derivatives as integrals over boundary data in a very general way:

Grégoire Allaire, Charles Dapogny, François Jouve. “Shape and topology optimization.” Andrea Bonito; Ricardo H. Nochetto. Geometric partial differential equations, part II, 22, Elsevier, 2021, Handbook of Numerical Analysis, 10.1016/bs.hna.2020.10.004.

(I’m not just speaking hypothetically here: We’ve implemented high-dimensional optimization of BEM solutions with respect to boundary shape via adjoint methods several times in my group, e.g. in Yao et al. (2020), and you can find papers on similar topics by many different authors.)

(If you want to use Levenberg–Marquardt methods, you need more than just the gradient. However, you can do L–M in a matrix–free fashion with e.g. Newton–Krylov iterations, which only require Jacobian–vector and vector–Jacobian products, the former of which use forward-mode differentiation and the latter uses reverse-mode, both of which can be implemented analytically in a differentiate-then-discretize style. There is another matrix-free L–M method described in Kandel et al. (2021). But I would start with just computing the gradient and using ordinary gradient-based optimization.)

claudiop21 · May 28, 2026, 2:31pm

Thank you very much for your answer. I will give a close look to the packages you mention and try with them.

The number of variables n is around 30, so is a relatively high number. I know the techniques you mention. Actually, I am very familiar with methods for shape optimization like the Frechet derivative, the shape derivative or the topological derivative, but I am trying to optimize the problem using standard optimization in order to compare the output provided by various methods.

stevengj · May 28, 2026, 4:58pm

This is not too terrible for derivative-free optimization (or finite differences) — probably at least an order of magnitude slower than if you had the gradient analytically, but you can afford an order of magnitude if your solver is fast enough (e.g. it is a 2d BEM problem). Gradients only become indispensable when you have thousands to millions (or more) of parameters, as in topology optimization.

claudiop21 · May 28, 2026, 10:51pm

Great! I am not very familiar with optimization and I did not know what is considered a big amount. Thank you very much for your help and your suggestions. I will take a close and deep look on the recommended packages.

astrobc1 · May 31, 2026, 1:02am

You can try Nelder Mead if it’s not too many parameters. This version I created fits the entire parameter space followed by fits over pairs of parameters. And supports bounds by penalizing the objective, although I’m not sure how stable it is for a wide range of problems. It’s not registered, but feel free to try this module.

github.com/astrobc1/IterativeNelderMead.jl

src/IterativeNelderMead.jl

master

module IterativeNelderMead

using Statistics, LinearAlgebra

export NelderMeadOptions, optimize, initial_simplex, initial_state

struct NelderMeadOptions
    max_fcalls::Int
    no_improve_break::Int
    n_iterations::Int
    ftol_rel::Float64
    ftol_abs::Float64
    xtol_rel::Float64
    xtol_abs::Float64
end


function NelderMeadOptions(n_vary::Int, ;max_fcalls::Int=150000 * n_vary, no_improve_break::Int=3, n_iterations::Int=n_vary, ftol_rel::Real=1E-8, ftol_abs::Real=1E-12, xtol_rel::Real=1E-8, xtol_abs::Real=1E-12)
    @assert n_vary > 0
    return NelderMeadOptions(max_fcalls, no_improve_break, n_iterations, ftol_rel, ftol_abs, xtol_rel, xtol_abs)

This file has been truncated. show original

stevengj · May 31, 2026, 11:35am

There is almost always a better derivative-free optimization algorithm … especially for smooth functions … than Nelder–Mead, which has been obsolete for decades (and is known to fail to converge in some cases). If the function is smooth (twice differentiable, even if you don’t have access to the derivatives), I usually recommend the Powell algorithms like COBYLA or BOBYQA; see also Conn’s book Derivative-Free Optimization.

IIRC, Nelder–Mead with bounds applied naively has a tendency to run into problems due to degenerate simplices packed against the bounds.

More generally, simple penalties in the objective (barrier functions) require a lot of care as a way to implement constraints. Implemented naively, they can lead to suboptimal results (for diverging barriers that don’t allow the variables to get close enough to the constraints), allow infeasible points (for finite barriers), and/or make the optimization “stiff” (ill-conditioned Hessian) and slowly converging (for barriers that are too steep).

One good way to add constraints into an unconstrained optimization method is an augmented-Lagrangian approach, but this still needs to repeat the optimization several times in order to adaptively tune the penalty strengths.

It’s usually better to employ an optimization algorithm that is built to support the constraints that you want, in my experience. And there are lots of algorithms that support simple bound constraints, such as the Powell algorithms I mentioned above.

astrobc1 · May 31, 2026, 12:54pm

You’ve raised several valid points. No solver is best for every problem. In the case of bounds, I’ve found good behavior by using parameter transformations with NM too. However

Nelder–Mead, which has been obsolete for decades (and is known to fail to converge in some cases).

I wouldn’t call it obsolete.

I’ve worked on several curve fitting-like problems where scipy’s entire suite of solvers failed to find the optimal solution, including Powell, whereas my iterative NM finds it. But maybe an iterative Powell algorithm would perform even better.

stevengj · May 31, 2026, 5:28pm

The “Powell” method in scipy’s minimize function refers to a much older method from Powell (1964) than the BOBYQA and Prima.jl algorithms I suggested above. Powell published very different and greatly superior (in my experience) derivative-free quasi-Newton methods decades after the 1964 work.

(Powell is one of the seminal figures in the field of nonlinear optimization, so there are many different algorithms that bear his name.)

Of course, if you have a function that is non-smooth, that rules out a lot of the “nicer” methods, and you are usually well advised to reformulate your problem into a smooth NLP if possible. The other key concern is that of local vs. global minima — if your function has many local minima, and you apply a local optimization algorithm, it is largely luck + initialization that determines which local optimum is found. In such cases, you may have to resort to global optimization algorithms, often followed by local minimization to “polish” the solution accuracy. I’m guessing that you were in one of these two situations if the other solvers “failed”.

Topic		Replies	Views
How to put bounds on Nelder-Mead in Optim? Optimization (Mathematical) question	25	8250	February 2, 2024
Optimization on a manifold Optimization (Mathematical) question	33	2349	June 21, 2021
DFO algorithms for constrained optimization Optimization (Mathematical)	0	517	August 22, 2018
Constraint optimization without gradient Optimization (Mathematical)	3	2817	May 21, 2018
Comparing non-linear least squares solvers Optimization (Mathematical) nonlinear , nonlinear-optimizati	37	5144	March 5, 2024

Derivative-free optimization of bounded nonlinear least squares

Related topics