How to efficiently do a martix multiplication?

using LinearAlgebra, BenchmarkTools
function myAllocTest(A)
    W = copy(A); # lets not do uninitialized
    D = copy(A);
    m = 2.0
    pb = Vector{Float64}(undef, 50)
    Dp = Vector{Float64}(undef, 50)
    @views for i=1:50
        p = (pb[1:end-i+1] .= D[i:end, i])
        mul!(Dp, D[i:end, :]', p) 
        D[i:end, :] .-= m .* p .* Dp'
        mul!(Dp, W[:,i:end], p) 
        W[:, i:end] .-= Dp .* m .* p';
    end
    D, W
end
A = rand(50,50);
D1, W1 = myAllocTest(A);

Testing vs original version:

function myAllocTest_old(A)
    W = copy(A);
    D = copy(A);
    m = 2.0
    for i=1:50
        p = D[i:end, i]
        D[i:end, :] = D[i:end, :] - (m*p)*(p'*D[i:end,:])
        W[:, i:end] = W[:, i:end] - (W[:,i:end]*p)*(m*p)';
    end
    D, W
end
D2, W2 = myAllocTest_old(A);

Correctness test:

julia> D1, W1 = myAllocTest(A);

julia> D2, W2 = myAllocTest_old(A);

julia> D1[isnan.(D1)] .= 12312431234.12341;

julia> D2[isnan.(D2)] .= 12312431234.12341;

julia> W1[isnan.(W1)] .= 12312431234.12341;

julia> W2[isnan.(W2)] .= 12312431234.12341;

julia> D1 ≈ D2
true

julia> W1 ≈ W2
true

I set the (huge number of) NaN values to the same non-NaN number, for the sake of getting it to return true for matching NaNs. But I’d wager to bet your original code is not doing what you intended, because the results look like total junk.

Benchmarks of three functions returning total junk (I’m adding foo(A) = nothing to the mix of useless functions):

julia> @benchmark myAllocTest($A)
BenchmarkTools.Trial:
  memory estimate:  40.25 KiB
  allocs estimate:  6
  --------------
  minimum time:     137.796 μs (0.00% GC)
  median time:      139.075 μs (0.00% GC)
  mean time:        140.731 μs (0.64% GC)
  maximum time:     1.523 ms (88.13% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark myAllocTest_old($A)
BenchmarkTools.Trial:
  memory estimate:  4.07 MiB
  allocs estimate:  734
  --------------
  minimum time:     645.335 μs (0.00% GC)
  median time:      666.032 μs (0.00% GC)
  mean time:        762.999 μs (11.55% GC)
  maximum time:     1.969 ms (56.75% GC)
  --------------
  samples:          6549
  evals/sample:     1

julia> @benchmark foo($A)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.016 ns (0.00% GC)
  median time:      0.020 ns (0.00% GC)
  mean time:        0.020 ns (0.00% GC)
  maximum time:     0.331 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The point of foo is that it’s easy to be fast if you don’t care about being correct or useful…

On that note, you can’t just add @views to your original code, because that will change the answer. If p is a view, it changes when you update D[i:end, :], meaning you’ll get a different answer when you calculate (W[:,i:end]*p)*(m*p)'. Notice how I explicitly copy the view over to pb, so that it wont change and gets the same (mostly NaN-filled) result.

3 Likes