The definition of triu!
in dense.jl
of LinearAlgebra is copy-pasted below. Observing line 181, we see that zero(M[i])
is calculated at each iteration of the inner loop. Obviously, this can be moved out of the loop and re-written as ZERO = zero(M[1])
just below idx = 1
. Line 181 now should read M[i] = ZERO
. The same thing happens again at line 225 in tril!
.
function tril!(M::AbstractMatrix, k::Integer)
m, n = size(M)
if !(-m - 1 <= k <= n - 1)
throw(ArgumentError(string("the requested diagonal, $k, must be at least ",
"$(-m - 1) and at most $(n - 1) in an $m-by-$n matrix")))
end
idx = 1
for j = 0:n-1
ii = min(max(0, j-k), m)
for i = idx:(idx+ii-1)
M[i] = zero(M[i])
end
idx += m
end
M
end
Benchmarking the original and modified versions shows a small performance improvement of about 8% .
using BenchmarkTools
a = rand(1000,1000);
julia> @btime triu!(a,0);
187.311 μs (0 allocations: 0 bytes)
julia> @btime my_triu!(a,0);
173.455 μs (0 allocations: 0 bytes)
Should I open an issue to modify both functions or did I miss something?
EDIT:
Oh, sorry, my fault. I modified a
between the two calls. The compiler of 0.7-alpha seems too smart and likely, it optimized that away and constant-propagated that ZERO
.
julia> a = rand(1000,1000);
julia> b = copy(a);
julia> @btime triu!(a,0);
173.969 μs (0 allocations: 0 bytes)
julia> @btime my_triu!(b,0);
173.456 μs (0 allocations: 0 bytes)