Almost but not quite univariate trend filtering / compressive sensing

I need to minimize for x a convex (I think it is convex from plots I have made, but I don’t understand L1 very well) subproblem of the form

||y - Ax||² + λ²|Dx|_1

where D is a first difference operator for piecewise constant x. Though it looks like univariate trend filtering or LASSO, A is not identity, and I actually have a sensible means of selecting λ² when sweeping through its values. When D is identity it does reduce to LASSO, and coordinate descent works quite well for the \sim 50 dimensional x that I have. I think I have a coordinate descent algorithm for the problem above as well, but it’s kludgy when looking at the distance between adjacent elements of x to set them equal.

Are there any pointers on how to solve this problem? ADMM? Something else? Are there frameworks in Julia for the above? Thanks.

This problem is often called “total variation denoising”, with your \|Dx\|_1 called the total variation and sometimes written \|x\|_{TV}. That’s a good place to start looking.

Note that you can solve this via a change of variables. Define z = [x_1, (x_2-x_1), (x_3-x_2) \ldots]^\top = [x_1, (Dx)^\top]^\top and define B such that Bz=Ax and it begins to resemble the standard LASSO in variable z. Note, however, that you will need to change your \ell_1 penalty to be \|\Lambda z\|_1 so that you can avoid penalizing the x_1 term.

Alternatively, here’s one paper (see Section II.C) that solves a similar optimization (with a Poisson negative-log-likelihood rather than quadratic term) using majorization minimization. MM is nice because it has very few parameters to set, although it’s not as fast as other methods.

Hopefully someone else can point you to a nice solver (maybe one that already exists in Julia or could easily be adapted). Otherwise you could adapt the one in the linked paper.

Thanks for your reply … I did think this was similar to total variation, except that I don’t have as many parameters x_i as observations y_i which if I did would make it the same as the trend filtering problem (except now with non-trivial complications in D if x is an image).

Nice suggestion about the change of variables. By the \Lambda operator, is this the same as identity with the first row removed to eliminate regularizing z_1=x_1? Wouldn’t this change the lasso – I guess I could simply not update z_1 in the coordinate descent for this.

If I don’t get any more suggestions happy to accept your answer.

Yes, \Lambda was just meant to be the identity except 0 on the x_1 block.

You could experiment with making z_1 = \eta x_1 for some large \eta so that a very small z_1 yields a large x_1. This isn’t the exact solve, but for a sufficiently large \eta it’s very close (but too large might cause convergence/stability issues). Then you would be able to approximate this using the standard LASSO formulation.

1 Like