Speedup of `\` solver bottleneck

Below is my implementation of the block principal pivoting algorithm for solving a non-negative linear least squares problem. This function is the bottleneck in my code (running on v0.6.2), which calls this in an inner loop with square matrices AtA of size 200-600 and appropriate vectors Atb and x0 as inputs.

Is there any way to speed this up?

Based on profiling a Matlab version of this, it appears that essentially all the time is spent solving the linear system: x[F]=AtA[F,F]\Atb[F], where F is a mask that changes significantly every iteration. I have tried prepending @views and/or @inbounds, without any effect on the timings (measured using @btime). Also, @code_warntype does not indicate any type instability.

Any advice would be much appreciated!

Here is the full function:

function nnls_bpp(AtA,Atb,x0)

    n = length(Atb)
    F = x0 .> 0

    x = zeros(n)
    x[F] = AtA[F,F]\Atb[F]
    y = AtA*x - Atb
    xFneg = (x .< 0) .& F
    yGneg = (y .< 0) .& .!F
    nInfeasible = sum(xFneg .| yGneg)

    p = 3
    t = n+1
    while nInfeasible > 0

        if nInfeasible < t
            t = nInfeasible
            p = 3
            F[xFneg] = false
            F[yGneg] = true
        elseif p >= 1
            p = p - 1
            F[xFneg] = false
            F[yGneg] = true
        else
            idx = findlast(xFneg .| yGneg)
            F[idx] = .!F[idx]
        end

        x = zeros(n)
        x[F] = AtA[F,F]\Atb[F]
        y = AtA*x - Atb
        xFneg = (x .< 0) .& F
        yGneg = (y .< 0) .& .!F
        nInfeasible = sum(xFneg .| yGneg)

    end

    return x

end

Have you tried using other solvers like IterativeSolvers.jl yet? Are there special matrix properties to exploit?

IterativeSolvers.jl does not seem to support non-negativity constraints needed for this NNLS problem.

I have considered the old Lawson-Hanson active-set algorithm, as well as the somewhat more recent “Fast NNLS” by Bro-De Jong. The block prinicipal pivoting algorithm turns out to be an order of magnitude faster than these.

The matrix AtA is symmetric and appears to be positive definite.

If it’s SPD then use cg from IterativeSolvers.jl. Or you can try wrapping AtA in a Symmetric before \ to speed some operations up.

And neither does \?

Thanks for the suggestions! (Sorry, I misunderstood you - thinking to use cg to replace my entire function.)

I just tried cg from IterativeSolvers. Unfortunately, it is 3-5 times slower than \ for my problem.

Surprisingly, replacing AtA[F,F] by Symmetric(AtA[F,F]) results in a 20% slowdown.

Try Optim.jl to solve the whole problem. Use Fminbox{ConjugateGradient}.

In case you want to try Lawson-Hanson, see https://github.com/rdeits/NNLS.jl for a good implementation (a direct port of the old Fortran code that doesn’t allocate and is fully generic).

2 Likes

Perhaps there is some sort of analog to FastMath for large matrix multiplication/division that solves x\y much more approximately. You probably don’t need a huge amount of precision (possibly ever, but certainly not until your final round of iterations). You might also be able to talk with BLAS directly and try to exploit information about your matrix (e.g. symmetry).

Thanks for all the hints!