Primal-Dual Hybrid Gradient for L1 regularization

Replace

G1(f) = (Kf - b)'*(Kf-b) + alpha * sum(abs.(f))

with

G2(f) = (Kf - b)'*(Kf-b) + alpha * sL1(f,1.e-7)

and send G2 to your favorite smooth optimization package. Most of them can handle the nonnegativity constraint with ease.

I have a recent paper that uses this idea for a very similar problem, but it may be more mathematical than you need.

Understood, here is an implementation:

Smooth L1 norm using OptimizationOptimJL.BFGS
using Optimization, OptimizationOptimJL, LinearAlgebra

K = rand(10, 10)
g = rand(10)
α = 1

function sL1(x::AbstractVector{T}, mu=1.e-7) where {T}
    snrm = sum(Cabs.(x, mu))
    return snrm
end

function Cabs(x, mu)
    p = 4.0 * mu * mu
    Cabs = sqrt(x * x + p)
    return Cabs
end

function G(f, p)

    # u = f, p = [K,s,α]
    k = p[1]
    s = p[2]
    α = p[3]

    r = K * f - s
    return r' * r + α * sL1(f, 1.e-7)

end

optf = Optimization.OptimizationFunction(G, Optimization.AutoForwardDiff())
prob = Optimization.OptimizationProblem(optf, ones(size(K, 2)),
    (K, g, α),
    lb=zeros(size(K, 2)), ub=[Inf for i in 1:size(K, 2)])
f = OptimizationOptimJL.solve(prob, OptimizationOptimJL.BFGS())

It does converge to something, but it’s pretty slow compared to the other methods. Perhaps the way I’ve written it is far from optimal.
I assume we can remove the r allocations from the cost function, but I’m not sure what would be the best way to do it.

It seems you have precomputed K and b, so the allocations are not coming from that.

If something else is doing what you want and is fast enough, you should probably use that. You can invest a lot of time in the Optimization.jl docs before you figure this problem out.

1 Like

I am not aware of algorithms that do constrained PDHG, so you would have to check with a cost that is pus infinity for negative input and check what it’s pros (and grad) are in the PDHG setting are; one rephrasing could be to check out to work on the set of positive numbers (as a manifold), but I am super biased there, since the Chambolle-Pock I linked can (among others) do exactly that (and I implemented that). You would still need special probes and gradients then.

1 Like