using LinearAlgebra, BenchmarkTools
function myAllocTest(A)
W = copy(A); # lets not do uninitialized
D = copy(A);
m = 2.0
pb = Vector{Float64}(undef, 50)
Dp = Vector{Float64}(undef, 50)
@views for i=1:50
p = (pb[1:end-i+1] .= D[i:end, i])
mul!(Dp, D[i:end, :]', p)
D[i:end, :] .-= m .* p .* Dp'
mul!(Dp, W[:,i:end], p)
W[:, i:end] .-= Dp .* m .* p';
end
D, W
end
A = rand(50,50);
D1, W1 = myAllocTest(A);
Testing vs original version:
function myAllocTest_old(A)
W = copy(A);
D = copy(A);
m = 2.0
for i=1:50
p = D[i:end, i]
D[i:end, :] = D[i:end, :] - (m*p)*(p'*D[i:end,:])
W[:, i:end] = W[:, i:end] - (W[:,i:end]*p)*(m*p)';
end
D, W
end
D2, W2 = myAllocTest_old(A);
Correctness test:
julia> D1, W1 = myAllocTest(A);
julia> D2, W2 = myAllocTest_old(A);
julia> D1[isnan.(D1)] .= 12312431234.12341;
julia> D2[isnan.(D2)] .= 12312431234.12341;
julia> W1[isnan.(W1)] .= 12312431234.12341;
julia> W2[isnan.(W2)] .= 12312431234.12341;
julia> D1 ≈ D2
true
julia> W1 ≈ W2
true
I set the (huge number of) NaN
values to the same non-NaN
number, for the sake of getting it to return true for matching NaN
s. But I’d wager to bet your original code is not doing what you intended, because the results look like total junk.
Benchmarks of three functions returning total junk (I’m adding foo(A) = nothing
to the mix of useless functions):
julia> @benchmark myAllocTest($A)
BenchmarkTools.Trial:
memory estimate: 40.25 KiB
allocs estimate: 6
--------------
minimum time: 137.796 μs (0.00% GC)
median time: 139.075 μs (0.00% GC)
mean time: 140.731 μs (0.64% GC)
maximum time: 1.523 ms (88.13% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark myAllocTest_old($A)
BenchmarkTools.Trial:
memory estimate: 4.07 MiB
allocs estimate: 734
--------------
minimum time: 645.335 μs (0.00% GC)
median time: 666.032 μs (0.00% GC)
mean time: 762.999 μs (11.55% GC)
maximum time: 1.969 ms (56.75% GC)
--------------
samples: 6549
evals/sample: 1
julia> @benchmark foo($A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 0.016 ns (0.00% GC)
median time: 0.020 ns (0.00% GC)
mean time: 0.020 ns (0.00% GC)
maximum time: 0.331 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
The point of foo
is that it’s easy to be fast if you don’t care about being correct or useful…
On that note, you can’t just add @view
s to your original code, because that will change the answer. If p
is a view, it changes when you update D[i:end, :]
, meaning you’ll get a different answer when you calculate (W[:,i:end]*p)*(m*p)'
. Notice how I explicitly copy the view over to pb
, so that it wont change and gets the same (mostly NaN
-filled) result.