The definition of
dense.jl of LinearAlgebra is copy-pasted below. Observing line 181, we see that
zero(M[i]) is calculated at each iteration of the inner loop. Obviously, this can be moved out of the loop and re-written as
ZERO = zero(M) just below
idx = 1. Line 181 now should read
M[i] = ZERO. The same thing happens again at line 225 in
function tril!(M::AbstractMatrix, k::Integer) m, n = size(M) if !(-m - 1 <= k <= n - 1) throw(ArgumentError(string("the requested diagonal, $k, must be at least ", "$(-m - 1) and at most $(n - 1) in an $m-by-$n matrix"))) end idx = 1 for j = 0:n-1 ii = min(max(0, j-k), m) for i = idx:(idx+ii-1) M[i] = zero(M[i]) end idx += m end M end
Benchmarking the original and modified versions shows a small performance improvement of about 8% .
using BenchmarkTools a = rand(1000,1000); julia> @btime triu!(a,0); 187.311 μs (0 allocations: 0 bytes) julia> @btime my_triu!(a,0); 173.455 μs (0 allocations: 0 bytes)
Should I open an issue to modify both functions or did I miss something?
Oh, sorry, my fault. I modified
a between the two calls. The compiler of 0.7-alpha seems too smart and likely, it optimized that away and constant-propagated that
julia> a = rand(1000,1000); julia> b = copy(a); julia> @btime triu!(a,0); 173.969 μs (0 allocations: 0 bytes) julia> @btime my_triu!(b,0); 173.456 μs (0 allocations: 0 bytes)