As an additional constraint, the Jacobian of my problem is not available and cannot be computed using automatic differentiation.
I am looking for package in Julia similar to MATLAB’s lsqnonlin, which uses Trust Region Reflective (TRR) algorithm, but I have not found any package in Julia that fits completely with all my requirements. Does anybody know if there is one suited for my purpose?
Also, if there is any optimization algorithm better than TRR for my purpose (and is implemented in a Julia package) I would love to hear about it, so I am also open to your suggestion and diving deeper in optimization.
Finally, I would use MATLAB but I already have all the code in Julia, the function evaluation is much faster in Julia and I am planning in uploading the code to HPC cluster where MATLAB is not available.
@adienes Thanks, but my problem is not a quadratic form but a strongly non-linear one, so that package and the approximation does not fit my needs, but thank you anyway!
@WalterMadelim The function f is the normal derivative along the boundary of the solution of the Helmholtz equation, which is computed numerically using the Boundary Integral Equation Method. x is the parametrization of the boundary in a certain part of the boundary. As you can see, the problem is strongly non-linear and the computation of the jacobian either analytically or using automatic differentiation is not an option.
@langestefan Yes, my function admits finite differences (this is how MATLAB lsqnonlin works). I will check the package you propose and the technique you mention. They look interesting and may have potential, thank you very much.
You can also simply use derivative-free optimization (DFO) methods like PRIMA.jl or BOBYQA in NLopt.jl (which often internally construct gradient estimates, but don’t incur the cost of a full finite-difference approximation at each step).
What is the dimension n of your parameter space x \in \Omega \subseteq \mathbb{R}^n? The basic problem with both finite differences and other DFO methods is that they incur costs that are \gtrsim n \times {} the cost of f(x), so they are impractical when f is expensive and n is large.
(Depending on what optimization algorithm you are using, you typically need at most the gradient of your objective function, not the full Jacobian of f, though the latter is needed for Levenberg–Marquardt/Gauss–Newton methods.)
AD may not work, but computation analytically is absolutely an option. You just have to learn how to take derivatives at a higher level of abstraction than in first-year calculus — we have a whole course 18.063: Matrix Calculus about this at MIT, with online course notes, that may be helpful to you.
One tricky issue with a BEM setting like this is that, as the shape of your boundary changes, you might need to re-mesh, which means that the discretized solution may change non-differentiably. A good solution is a “differentiate-then-discretize” approach in which you differentiate as if you were solving the exact surface-integral equations (SIE), which gives you an “adjoint SIE” that you solve to obtain the gradient. Then you discretize this adjoint problem similarly to your original SIE, and the gradient is correct up to discretization error, which is typically good enough for optimization.
Obtaining an adjoint problem is also known as “reverse mode” or “backpropagation” differentiation, and has the advantage that computing the gradient costs only a single additional SIE solve regardless of n, so it becomes feasible to do shape optimization over a huge number of parameters — this is the only practical approach in high dimensions. A good reference for this in the SIE/BEM setting is this paper, which obtains shape derivatives as integrals over boundary data in a very general way:
(I’m not just speaking hypothetically here: We’ve implemented high-dimensional optimization of BEM solutions with respect to boundary shape via adjoint methods several times in my group, e.g. in Yao et al. (2020), and you can find papers on similar topics by many different authors.)
(If you want to use Levenberg–Marquardt methods, you need more than just the gradient. However, you can do L–M in a matrix–free fashion with e.g. Newton–Krylov iterations, which only require Jacobian–vector and vector–Jacobian products, the former of which use forward-mode differentiation and the latter uses reverse-mode, both of which can be implemented analytically in a differentiate-then-discretize style. There is another matrix-free L–M method described in Kandel et al. (2021). But I would start with just computing the gradient and using ordinary gradient-based optimization.)
Thank you very much for your answer. I will give a close look to the packages you mention and try with them.
The number of variables n is around 30, so is a relatively high number. I know the techniques you mention. Actually, I am very familiar with methods for shape optimization like the Frechet derivative, the shape derivative or the topological derivative, but I am trying to optimize the problem using standard optimization in order to compare the output provided by various methods.
This is not too terrible for derivative-free optimization (or finite differences) — probably at least an order of magnitude slower than if you had the gradient analytically, but you can afford an order of magnitude if your solver is fast enough (e.g. it is a 2d BEM problem). Gradients only become indispensable when you have thousands to millions (or more) of parameters, as in topology optimization.