# Potential small inefficiency in `triu!` and `tril!`?

The definition of `triu!` in `dense.jl` of LinearAlgebra is copy-pasted below. Observing line 181, we see that `zero(M[i])` is calculated at each iteration of the inner loop. Obviously, this can be moved out of the loop and re-written as `ZERO = zero(M)` just below `idx = 1`. Line 181 now should read `M[i] = ZERO`. The same thing happens again at line 225 in `tril!`.

``````function tril!(M::AbstractMatrix, k::Integer)
m, n = size(M)
if !(-m - 1 <= k <= n - 1)
throw(ArgumentError(string("the requested diagonal, \$k, must be at least ",
"\$(-m - 1) and at most \$(n - 1) in an \$m-by-\$n matrix")))
end
idx = 1
for j = 0:n-1
ii = min(max(0, j-k), m)
for i = idx:(idx+ii-1)
M[i] = zero(M[i])
end
idx += m
end
M
end
``````

Benchmarking the original and modified versions shows a small performance improvement of about 8% .

``````using BenchmarkTools
a = rand(1000,1000);
julia> @btime triu!(a,0);
187.311 μs (0 allocations: 0 bytes)
julia> @btime my_triu!(a,0);
173.455 μs (0 allocations: 0 bytes)
``````

Should I open an issue to modify both functions or did I miss something?

EDIT:

Oh, sorry, my fault. I modified `a` between the two calls. The compiler of 0.7-alpha seems too smart and likely, it optimized that away and constant-propagated that `ZERO`.

``````julia> a = rand(1000,1000);

julia> b = copy(a);

julia> @btime triu!(a,0);
173.969 μs (0 allocations: 0 bytes)

julia> @btime my_triu!(b,0);
173.456 μs (0 allocations: 0 bytes)
``````