As an additional constraint, the Jacobian of my problem is not available and cannot be computed using automatic differentiation.
I am looking for package in Julia similar to MATLAB’s lsqnonlin, which uses Trust Region Reflective (TRR) algorithm, but I have not found any package in Julia that fits completely with all my requirements. Does anybody know if there is one suited for my purpose?
Also, if there is any optimization algorithm better than TRR for my purpose (and is implemented in a Julia package) I would love to hear about it, so I am also open to your suggestion and diving deeper in optimization.
Finally, I would use MATLAB but I already have all the code in Julia, the function evaluation is much faster in Julia and I am planning in uploading the code to HPC cluster where MATLAB is not available.
@adienes Thanks, but my problem is not a quadratic form but a strongly non-linear one, so that package and the approximation does not fit my needs, but thank you anyway!
@WalterMadelim The function f is the normal derivative along the boundary of the solution of the Helmholtz equation, which is computed numerically using the Boundary Integral Equation Method. x is the parametrization of the boundary in a certain part of the boundary. As you can see, the problem is strongly non-linear and the computation of the jacobian either analytically or using automatic differentiation is not an option.
@langestefan Yes, my function admits finite differences (this is how MATLAB lsqnonlin works). I will check the package you propose and the technique you mention. They look interesting and may have potential, thank you very much.
You can also simply use derivative-free optimization (DFO) methods like PRIMA.jl or BOBYQA in NLopt.jl (which often internally construct gradient estimates, but don’t incur the cost of a full finite-difference approximation at each step).
What is the dimension n of your parameter space x \in \Omega \subseteq \mathbb{R}^n? The basic problem with both finite differences and other DFO methods is that they incur costs that are \gtrsim n \times {} the cost of f(x), so they are impractical when f is expensive and n is large.
(Depending on what optimization algorithm you are using, you typically need at most the gradient of your objective function, not the full Jacobian of f, though the latter is needed for Levenberg–Marquardt/Gauss–Newton methods.)
AD may not work, but computation analytically is absolutely an option. You just have to learn how to take derivatives at a higher level of abstraction than in first-year calculus — we have a whole course 18.063: Matrix Calculus about this at MIT, with online course notes, that may be helpful to you.
One tricky issue with a BEM setting like this is that, as the shape of your boundary changes, you might need to re-mesh, which means that the discretized solution may change non-differentiably. A good solution is a “differentiate-then-discretize” approach in which you differentiate as if you were solving the exact surface-integral equations (SIE), which gives you an “adjoint SIE” that you solve to obtain the gradient. Then you discretize this adjoint problem similarly to your original SIE, and the gradient is correct up to discretization error, which is typically good enough for optimization.
Obtaining an adjoint problem is also known as “reverse mode” or “backpropagation” differentiation, and has the advantage that computing the gradient costs only a single additional SIE solve regardless of n, so it becomes feasible to do shape optimization over a huge number of parameters — this is the only practical approach in high dimensions. A good reference for this in the SIE/BEM setting is this paper, which obtains shape derivatives as integrals over boundary data in a very general way:
(I’m not just speaking hypothetically here: We’ve implemented high-dimensional optimization of BEM solutions with respect to boundary shape via adjoint methods several times in my group, e.g. in Yao et al. (2020), and you can find papers on similar topics by many different authors.)
(If you want to use Levenberg–Marquardt methods, you need more than just the gradient. However, you can do L–M in a matrix–free fashion with e.g. Newton–Krylov iterations, which only require Jacobian–vector and vector–Jacobian products, the former of which use forward-mode differentiation and the latter uses reverse-mode, both of which can be implemented analytically in a differentiate-then-discretize style. There is another matrix-free L–M method described in Kandel et al. (2021). But I would start with just computing the gradient and using ordinary gradient-based optimization.)
Thank you very much for your answer. I will give a close look to the packages you mention and try with them.
The number of variables n is around 30, so is a relatively high number. I know the techniques you mention. Actually, I am very familiar with methods for shape optimization like the Frechet derivative, the shape derivative or the topological derivative, but I am trying to optimize the problem using standard optimization in order to compare the output provided by various methods.
This is not too terrible for derivative-free optimization (or finite differences) — probably at least an order of magnitude slower than if you had the gradient analytically, but you can afford an order of magnitude if your solver is fast enough (e.g. it is a 2d BEM problem). Gradients only become indispensable when you have thousands to millions (or more) of parameters, as in topology optimization.
Great! I am not very familiar with optimization and I did not know what is considered a big amount. Thank you very much for your help and your suggestions. I will take a close and deep look on the recommended packages.
You can try Nelder Mead if it’s not too many parameters. This version I created fits the entire parameter space followed by fits over pairs of parameters. And supports bounds by penalizing the objective, although I’m not sure how stable it is for a wide range of problems. It’s not registered, but feel free to try this module.
There is almost always a better derivative-free optimization algorithm … especially for smooth functions … than Nelder–Mead, which has been obsolete for decades (and is known to fail to converge in some cases). If the function is smooth (twice differentiable, even if you don’t have access to the derivatives), I usually recommend the Powell algorithms like COBYLA or BOBYQA; see also Conn’s book Derivative-Free Optimization.
IIRC, Nelder–Mead with bounds applied naively has a tendency to run into problems due to degenerate simplices packed against the bounds.
More generally, simple penalties in the objective (barrier functions) require a lot of care as a way to implement constraints. Implemented naively, they can lead to suboptimal results (for diverging barriers that don’t allow the variables to get close enough to the constraints), allow infeasible points (for finite barriers), and/or make the optimization “stiff” (ill-conditioned Hessian) and slowly converging (for barriers that are too steep).
One good way to add constraints into an unconstrained optimization method is an augmented-Lagrangian approach, but this still needs to repeat the optimization several times in order to adaptively tune the penalty strengths.
It’s usually better to employ an optimization algorithm that is built to support the constraints that you want, in my experience. And there are lots of algorithms that support simple bound constraints, such as the Powell algorithms I mentioned above.
You’ve raised several valid points. No solver is best for every problem. In the case of bounds, I’ve found good behavior by using parameter transformations with NM too. However
Nelder–Mead, which has been obsolete for decades (and is known to fail to converge in some cases).
I wouldn’t call it obsolete.
I’ve worked on several curve fitting-like problems where scipy’s entire suite of solvers failed to find the optimal solution, including Powell, whereas my iterative NM finds it. But maybe an iterative Powell algorithm would perform even better.
The “Powell” method in scipy’s minimize function refers to a much older method from Powell (1964) than the BOBYQA and Prima.jl algorithms I suggested above. Powell published very different and greatly superior (in my experience) derivative-free quasi-Newton methods decades after the 1964 work.
(Powell is one of the seminal figures in the field of nonlinear optimization, so there are many different algorithms that bear his name.)
Of course, if you have a function that is non-smooth, that rules out a lot of the “nicer” methods, and you are usually well advised to reformulate your problem into a smooth NLP if possible. The other key concern is that of local vs. global minima — if your function has many local minima, and you apply a local optimization algorithm, it is largely luck + initialization that determines which local optimum is found. In such cases, you may have to resort to global optimization algorithms, often followed by local minimization to “polish” the solution accuracy. I’m guessing that you were in one of these two situations if the other solvers “failed”.