 # Speedup of `\` solver bottleneck

Below is my implementation of the block principal pivoting algorithm for solving a non-negative linear least squares problem. This function is the bottleneck in my code (running on v0.6.2), which calls this in an inner loop with square matrices `AtA` of size 200-600 and appropriate vectors `Atb` and `x0` as inputs.

Is there any way to speed this up?

Based on profiling a Matlab version of this, it appears that essentially all the time is spent solving the linear system: `x[F]=AtA[F,F]\Atb[F]`, where `F` is a mask that changes significantly every iteration. I have tried prepending `@views` and/or `@inbounds`, without any effect on the timings (measured using `@btime`). Also, `@code_warntype` does not indicate any type instability.

Any advice would be much appreciated!

Here is the full function:

``````function nnls_bpp(AtA,Atb,x0)

n = length(Atb)
F = x0 .> 0

x = zeros(n)
x[F] = AtA[F,F]\Atb[F]
y = AtA*x - Atb
xFneg = (x .< 0) .& F
yGneg = (y .< 0) .& .!F
nInfeasible = sum(xFneg .| yGneg)

p = 3
t = n+1
while nInfeasible > 0

if nInfeasible < t
t = nInfeasible
p = 3
F[xFneg] = false
F[yGneg] = true
elseif p >= 1
p = p - 1
F[xFneg] = false
F[yGneg] = true
else
idx = findlast(xFneg .| yGneg)
F[idx] = .!F[idx]
end

x = zeros(n)
x[F] = AtA[F,F]\Atb[F]
y = AtA*x - Atb
xFneg = (x .< 0) .& F
yGneg = (y .< 0) .& .!F
nInfeasible = sum(xFneg .| yGneg)

end

return x

end
``````

Have you tried using other solvers like IterativeSolvers.jl yet? Are there special matrix properties to exploit?

IterativeSolvers.jl does not seem to support non-negativity constraints needed for this NNLS problem.

I have considered the old Lawson-Hanson active-set algorithm, as well as the somewhat more recent “Fast NNLS” by Bro-De Jong. The block prinicipal pivoting algorithm turns out to be an order of magnitude faster than these.

The matrix `AtA` is symmetric and appears to be positive definite.

If it’s SPD then use `cg` from IterativeSolvers.jl. Or you can try wrapping `AtA` in a `Symmetric` before `\` to speed some operations up.

And neither does `\`?

Thanks for the suggestions! (Sorry, I misunderstood you - thinking to use `cg` to replace my entire function.)

I just tried `cg` from `IterativeSolvers`. Unfortunately, it is 3-5 times slower than `\` for my problem.

Surprisingly, replacing `AtA[F,F]` by `Symmetric(AtA[F,F])` results in a 20% slowdown.

Try `Optim.jl` to solve the whole problem. Use Fminbox{ConjugateGradient}.

In case you want to try Lawson-Hanson, see https://github.com/rdeits/NNLS.jl for a good implementation (a direct port of the old Fortran code that doesn’t allocate and is fully generic).

2 Likes

Perhaps there is some sort of analog to FastMath for large matrix multiplication/division that solves x\y much more approximately. You probably don’t need a huge amount of precision (possibly ever, but certainly not until your final round of iterations). You might also be able to talk with BLAS directly and try to exploit information about your matrix (e.g. symmetry).

Thanks for all the hints!