I discovered an interesting thing while trying out some code on Julia 0.6. On my system, I can now fill an array of Float64
with zeros in about half the time that A .= 0.0
or fill!(A, 0.0)
does it.
The reason seems to be that when fill!
is compiled with a constant fill value of zero, it becomes an llvm.memset
instead of a loop.
But maybe someone who’s good with llvm voodoo could make the loop run as fast as a memset? This might speed up other code as well.
Here’s an example:
julia> versioninfo()
Julia Version 0.6.0-rc1.0
Commit 6bdb3950bd (2017-05-07 00:00 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> A = Array{Float64}(1_000_000);
julia> using BenchmarkTools
julia> @benchmark A .= 0.0
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 442.393 μs (0.00% GC)
median time: 452.308 μs (0.00% GC)
mean time: 467.586 μs (0.00% GC)
maximum time: 1.237 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> immutable Zero end
julia> Base.convert(::Type{Float64}, ::Zero) = 0.0
julia> @benchmark A .= Zero()
BenchmarkTools.Trial:
memory estimate: 32 bytes
allocs estimate: 1
--------------
minimum time: 237.058 μs (0.00% GC)
median time: 240.746 μs (0.00% GC)
mean time: 244.114 μs (0.00% GC)
maximum time: 548.111 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
(On Julia 0.5.2, both versions run at the slower speed.)