Thank you so much - my iteration is trying to find the optimal value of a parameter that minimises a function, e.g., argmin_{w} f(X, w, a). I need the negative inverse of the Hessian to approximate w’s (posterior) distribution, *at least* in the final iteration. In the intermediate iterations, I (only) need the diagonal of the negative inverse of Hessian, which is necessary to optimise the other parameter, a.

The Hessian of w has a particular structure, (X’X + Diagonal(d)). Unfortunately the dimensions of X and w are huge - X 30,000 by 2,000,000 and w 2,000,000 by 1 - which renders it impossible to calculate the diagonal of its negative inverse Hessian. Thus I exploited the Woodbury identity (A + BD^-1C)^-1 = A^-1 - A^-1B(D + CA^-1B)^-1CA^-1 ,assuming A is the diagonal matrix Diagonal(d) here, B being X’, C being X, and D being the identity matrix, such that the inversion can be conducted on a matrix of much smaller scale, (I + X * Diagonal(1 ./ d) * X’). I only need the diagonal elements of its inverse.

In the final iteration, I need the entire negative inverse Hessian, but it would be of much smaller size.

Thank you so much for taking the time to read this - is there a better approach to do it? I considered BFGS, but storing the actual (approximated) inverse Hessian at each step is very expensive as well.