Performance of value assignment inside functions

NicolasW · July 13, 2022, 11:24am

I am puzzled by the performance of Julia, somehow it matters if I fully specialize the function argument.
Consider the two functions:

@noinline @inbounds function update_gen!(y::AbstractVector{T}, x::AbstractVector{T}) where {T<:Number}
	y[1] = x[1]
	return nothing
end

@noinline @inbounds function update_float!(y::Vector{Float64}, x::Vector{Float64})
	y[1] = x[1]
	return nothing
end

and called with some input

t_a = [51.31]
t_b = [12.3]

update_gen!(t_a, t_b)
update_float!(t_a, t_b)

No matter how I measure performance (@time or @benchmark …) update_float! is always faster by a factor of ~2, while the output of @code_llvm and @code_native is as expected exactly the same for both version.
I would be very grateful for an explanation of this behavior.

jishnub · July 13, 2022, 11:36am

This is strange indeed, as I don’t see such a difference.

julia> @benchmark update_float!($t_a, $t_b)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  14.715 ns … 30.268 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     15.066 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.113 ns ±  0.324 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▂▃▃▅█▇                                             ▂
  ▄▃▁▄▄▅▅▆▇██████▇▃▁▄▃▁▁▄▄▁▁▃▃▁▃▄▃▄▄▅▅▆▅▆▆▅▆▆▆▇▇▆▇▆▇▇▇▇▇▆▇▇▇▆ █
  14.7 ns      Histogram: log(frequency) by time      16.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark update_gen!($t_a, $t_b)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  14.693 ns …  1.314 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     15.066 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.233 ns ± 13.017 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▁▂▃▂▃▃▇▇█                                     ▂
  ▄▁▁▁▁▄▄▃▁▅▄▆▅▆██████████▆▁▃▁▁▃▁▁▁▁▁▁▁▃▁▁▃▄▄▁▃▁▃▃▁▃▁▃▃▁▃▄▄▄▄ █
  14.7 ns      Histogram: log(frequency) by time      15.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

NicolasW · July 13, 2022, 11:49am

Try dropping the interpolation of the in/output and the Float64version becomes significantly faster.

stevengj · July 13, 2022, 1:20pm

Dropping the interpolation means that you are timing dynamic dispatch, which is not typically relevant to performance in real applications.

Topic		Replies	Views
Why are my higher order functions a million times slower than they should be? Performance	7	752	April 26, 2022
Performance issue due to function as an argument General Usage question , performance	16	844	September 22, 2023
The huge difference of two functions for the same goal inside and outside of function Performance	5	271	October 27, 2022
Dynamic dispatch in a for loop Performance question	17	1030	November 18, 2020
Implict function specialization vs. explict specialization vs. generated functions Internals & Design question	3	1173	April 23, 2017

Performance of value assignment inside functions

Related topics