How to solve a large quadratic programming problem

As soon as you form the huge matrix K^T K in this sort of problem, you lose. You really want a method that avoids forming this explicitly.

Two options to consider:

  1. You could just throw the function h(x) = \Vert g - Kx \Vert^2 + \alpha \Vert x \Vert^2 = x^T (K^T K + \alpha I) x - 2 g^T K x + g^T g at a generic gradient-based optimization algorithm with bound constraints x \succeq 0, like L-BFGS. The key point is to compute h and its gradient efficiently: compute h(x) = \Vert \cdots \Vert^2 + \Vert \cdots \Vert^2 via the sum of two norms, which only requires you to compute Kx, without ever forming K^T K + \alpha I, and compute the gradient via \nabla h = 2 [ K^T (Kx) + \alpha x] - 2K^T g (parens indicate order of operation), using only matrix–vector operations and never matrix–matrix (AD should also be capable of doing this for you).

  2. You could implement a more specialized method that exploits the specific structure of this problem. See the thread Suggestions needed for bound-constrained least squares solver for inspiration, though you’ll need to modify the solutions to take into account the \alpha term to avoid explicitly forming the huge matrix [K; \alpha I].

6 Likes