I should also note that if m \ll n, i.e. you have many fewer outputs then parameters (like a deep neural network with a zillion parameters where you are only trying to predict a small batch of training data on each step), then the problem is highly underdetermined. In this case the traditional Levenberg–Marquardt and Gauss–Newton methods are ill-suited — typical implementations form a huge n \times n matrix J^T J with the m \times n Jacobian J, which is expensive and rank-deficient (you avoid forming this explicitly with pivoted QR-based methods, but for L–M methods you want J^T J + \lambda I or its equivalent and you still end up with a large O(n^2) matrix somewhere if you’re not careful … you can avoid this with the SVD of J, but this seems to be rarely used in L–M methods, and it’s overkill for typical underdetermined applications where J^T J + \lambda I is well-conditioned and SPD thanks to the regularization). Of course, in an underdetermined problem there may be many solutions, but one might often want a solution close to the starting guess, and there may be no exact solutions due to the nonlinearity or due to constraints (e.g. bound constraints on \vec{x}).
It turns out that one can devise a variant of Levenberg–Marquardt that works well in such cases, based on the obvious idea of finding a regularized minimum-norm solution to the linearized overdetermined problem at each step, so that you use the small (m \times m), full-rank matrix J J^T instead (you can again use LQ to avoid squaring the condition number, but once you add the L–M regularization that is less of a concern). We described this algorithm in Chen et al. (2026), appendix B, and I wouldn’t be surprised if other authors have stumbled across the same technique too.
(Our application was for physics-driven topology optimization via Gridap.jl, not data-driven machine learning, but the underlying principles are generic. However, for stochastic optimization with sampled training data there is probably more work to be done to prove convergence?)
It might be nice to add this underdetermined L–M variant as an option to one of the optimization packages, not sure which one, I guess NonlinearSolve.jl? Right now our code for the paper is in a LeastSquaresOptim.jl fork at Inverse-Design-of-Resonances/LeastSquaresOptim/levenberg_marquardt.jl at main · aristeidis-karalis/Inverse-Design-of-Resonances · GitHub … I filed an issue: Levenberg–Marquardt variant for underdetermined least-square problems? · Issue #851 · SciML/NonlinearSolve.jl · GitHub