Hah, that’s funny. Try it with any value other than 0:
julia> x = Array{Float64}(undef, 1000, 1000)
f1!(x) = fill!(x, 1.0)
f2!(x) = x .= 1.0;
f3!(x) = x .= 1;
f4!(x) = x[:] .= 1.0;
f5!(x) = x[:] .= 1;
f6!(x) = x[:,:] .= 1;
function f7!(x)
@inbounds for i in eachindex(x)
x[i] = 1
end
end
function f8!(x)
for i in eachindex(x)
x[i] = 1
end
end
f8! (generic function with 1 method)
julia> @btime f1!($x);
430.771 μs (0 allocations: 0 bytes)
julia> @btime f2!($x);
429.985 μs (0 allocations: 0 bytes)
julia> @btime f3!($x);
431.457 μs (0 allocations: 0 bytes)
julia> @btime f4!($x);
430.093 μs (3 allocations: 128 bytes)
julia> @btime f5!($x);
430.087 μs (3 allocations: 128 bytes)
julia> @btime f6!($x);
432.691 μs (1 allocation: 48 bytes)
julia> @btime f7!($x);
431.716 μs (0 allocations: 0 bytes)
julia> @btime f8!($x);
465.570 μs (0 allocations: 0 bytes)
So what’s going on here? It’s that some of these cases allow for constant propagation of the 0 the “whole way down” to the inner loop — and if that 0 is available to LLVM at compile time, then LLVM can use special instructions to zero the entire chunk of memory.
As far as why some of these forms allow for constant propagation and some don’t, it appears as though there was a strange edge case in the compiler back in the 0.7 timeframe that prompted a simple workaround. That’s no longer necessary.
Yes, it’s because we have a peephole “performance optimization” in broadcast to use fill! for simple cases — because that should be the fastest way to do it. But it backfired here…