Design preconditioner for unconstrained optimization problem


I am working on the greedy algorithm developed in Baptista et al. ( to approximate the components of a transport map S (aka normalizing flow) that pushes forward a target density to a reference density. They consider lower triangular transport maps, i.e. the $k$th component is only a function of the first k dimensions.

The estimation of S relies on the resolution of decoupled unconstrained optimization problems (for the different components of S). For S_k the k-th component of the map, we have to solve for a coefficient vector c_k \in \mathbb{R}^{m} with m \sim 5-40. m is the number of features in the expansion. I am using L-BFGS in Optim.jl to solve this problem.

At the j th iteration of the greedy algorithm, we add a new feature \psi^{(j)} (a tensor product of univariate Hermite functions) to the library of features (based on some heuristics), and reoptimize the new coefficient vector c_k^{(j)} (the dimension of c_k^{(j)} increases by one, i.e. \text{dim}(c_k^{(j)}) = j = \text{dim}(c_{k-1}^{(j)}) + 1 ).

To accelerate the optimization of c_k^{(j)}, I would like to design a preconditioner. So far, the optimization is preconditioned by the outer product of the Jacobian with itself. The Jacobian is evaluated at the initial guess of the coefficient vector c_k^{(j)} = [c_k^{(j-1)}; 0].

Note: the component S_k is given by a nonlinear transformation of the expansion [\psi^{(1)}(\boldsymbol{x}); \ldots; \psi^{(m)}(\boldsymbol{x})] c_k^{(j)} to ensure some monotonicity constraint.

For a large number of features, this preconditioner is oftenly ill-conditioned. I have added an \alpha I term to regularize it. What are some possible alternatives?

What do you think about using the estimate of the inverse Hessian from the L-BFGS algorithm at step j-1, and complete the matrix with a one entry in the diagonal for the new feature?