I am puzzled by the performance of Julia, somehow it matters if I fully specialize the function argument.
Consider the two functions:
@noinline @inbounds function update_gen!(y::AbstractVector{T}, x::AbstractVector{T}) where {T<:Number}
y[1] = x[1]
return nothing
end
@noinline @inbounds function update_float!(y::Vector{Float64}, x::Vector{Float64})
y[1] = x[1]
return nothing
end
and called with some input
t_a = [51.31]
t_b = [12.3]
update_gen!(t_a, t_b)
update_float!(t_a, t_b)
No matter how I measure performance (@time
or @benchmark
…) update_float!
is always faster by a factor of ~2, while the output of @code_llvm
and @code_native
is as expected exactly the same for both version.
I would be very grateful for an explanation of this behavior.
This is strange indeed, as I don’t see such a difference.
julia> @benchmark update_float!($t_a, $t_b)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 14.715 ns … 30.268 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 15.066 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.113 ns ± 0.324 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▃▃▅█▇ ▂
▄▃▁▄▄▅▅▆▇██████▇▃▁▄▃▁▁▄▄▁▁▃▃▁▃▄▃▄▄▅▅▆▅▆▆▅▆▆▆▇▇▆▇▆▇▇▇▇▇▆▇▇▇▆ █
14.7 ns Histogram: log(frequency) by time 16.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark update_gen!($t_a, $t_b)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 14.693 ns … 1.314 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 15.066 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.233 ns ± 13.017 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▃▂▃▃▇▇█ ▂
▄▁▁▁▁▄▄▃▁▅▄▆▅▆██████████▆▁▃▁▁▃▁▁▁▁▁▁▁▃▁▁▃▄▄▁▃▁▃▃▁▃▁▃▃▁▃▄▄▄▄ █
14.7 ns Histogram: log(frequency) by time 15.7 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Try dropping the interpolation of the in/output and the Float64
version becomes significantly faster.
Dropping the interpolation means that you are timing dynamic dispatch, which is not typically relevant to performance in real applications.