Speedup of `\` solver bottleneck

stst · May 31, 2018, 5:47am

Below is my implementation of the block principal pivoting algorithm for solving a non-negative linear least squares problem. This function is the bottleneck in my code (running on v0.6.2), which calls this in an inner loop with square matrices AtA of size 200-600 and appropriate vectors Atb and x0 as inputs.

Is there any way to speed this up?

Based on profiling a Matlab version of this, it appears that essentially all the time is spent solving the linear system: x[F]=AtA[F,F]\Atb[F], where F is a mask that changes significantly every iteration. I have tried prepending @views and/or @inbounds, without any effect on the timings (measured using @btime). Also, @code_warntype does not indicate any type instability.

Any advice would be much appreciated!

Here is the full function:

function nnls_bpp(AtA,Atb,x0)

    n = length(Atb)
    F = x0 .> 0

    x = zeros(n)
    x[F] = AtA[F,F]\Atb[F]
    y = AtA*x - Atb
    xFneg = (x .< 0) .& F
    yGneg = (y .< 0) .& .!F
    nInfeasible = sum(xFneg .| yGneg)

    p = 3
    t = n+1
    while nInfeasible > 0

        if nInfeasible < t
            t = nInfeasible
            p = 3
            F[xFneg] = false
            F[yGneg] = true
        elseif p >= 1
            p = p - 1
            F[xFneg] = false
            F[yGneg] = true
        else
            idx = findlast(xFneg .| yGneg)
            F[idx] = .!F[idx]
        end

        x = zeros(n)
        x[F] = AtA[F,F]\Atb[F]
        y = AtA*x - Atb
        xFneg = (x .< 0) .& F
        yGneg = (y .< 0) .& .!F
        nInfeasible = sum(xFneg .| yGneg)

    end

    return x

end

ChrisRackauckas · May 31, 2018, 5:52am

Have you tried using other solvers like IterativeSolvers.jl yet? Are there special matrix properties to exploit?

stst · May 31, 2018, 6:22am

IterativeSolvers.jl does not seem to support non-negativity constraints needed for this NNLS problem.

I have considered the old Lawson-Hanson active-set algorithm, as well as the somewhat more recent “Fast NNLS” by Bro-De Jong. The block prinicipal pivoting algorithm turns out to be an order of magnitude faster than these.

The matrix AtA is symmetric and appears to be positive definite.

ChrisRackauckas · May 31, 2018, 6:24am

If it’s SPD then use cg from IterativeSolvers.jl. Or you can try wrapping AtA in a Symmetric before \ to speed some operations up.

And neither does \?

stst · May 31, 2018, 6:40am

Thanks for the suggestions! (Sorry, I misunderstood you - thinking to use cg to replace my entire function.)

I just tried cg from IterativeSolvers. Unfortunately, it is 3-5 times slower than \ for my problem.

Surprisingly, replacing AtA[F,F] by Symmetric(AtA[F,F]) results in a 20% slowdown.

mohamed82008 · May 31, 2018, 7:47am

Try Optim.jl to solve the whole problem. Use Fminbox{ConjugateGradient}.

tkoolen · May 31, 2018, 12:29pm

In case you want to try Lawson-Hanson, see https://github.com/rdeits/NNLS.jl for a good implementation (a direct port of the old Fortran code that doesn’t allocate and is fully generic).

Juser · May 31, 2018, 8:51pm

Perhaps there is some sort of analog to FastMath for large matrix multiplication/division that solves x\y much more approximately. You probably don’t need a huge amount of precision (possibly ever, but certainly not until your final round of iterations). You might also be able to talk with BLAS directly and try to exploit information about your matrix (e.g. symmetry).

stst · June 1, 2018, 3:33am

Thanks for all the hints!

Topic		Replies	Views
How to solve this Ax=b faster? Numerics linearalgebra , linearsolve	24	2561	June 7, 2022
What is a recommended way to solve a linear equation? New to Julia question , linearalgebra , linearsolve	31	1323	May 8, 2025
Solving linear system BC = X where B is a symmetric positive semi-definite matrix Performance linearalgebra , matrices , linearsolve	7	368	June 6, 2024
Linear solver \(A, B) performance vs Matlab A\b General Usage	32	7669	May 21, 2017
Solvers for Block-Tridiagonal Numerics linearalgebra	14	254	June 24, 2025

Speedup of `\` solver bottleneck

Related topics