Push! vs pushfirst! performance

I was surprised at this performance:

julia> function p(x, n)
           for i = 1:n
               push!(x, i)
           end
           x
       end
p (generic function with 1 method)
julia> function pf(x, n)
           for i = 1: n
               pushfirst!(x, i)
           end
           x  
       end
pf (generic function with 1 method)
julia> @benchmark pf(x, 100_000_000) setup=(x=Vector{Int}())
BenchmarkTools.Trial: 
  memory estimate:  834.17 MiB
  allocs estimate:  27
  --------------
  minimum time:     3.786 s (0.43% GC)
  median time:      3.820 s (0.22% GC)
  mean time:        3.820 s (0.22% GC)
  maximum time:     3.854 s (0.01% GC)
  --------------
  samples:          2
  evals/sample:     1
julia> @benchmark p(x, 100_000_000) setup=(x=Vector{Int}())
BenchmarkTools.Trial: 
  memory estimate:  834.17 MiB
  allocs estimate:  26
  --------------
  minimum time:     895.597 ms (0.04% GC)
  median time:      905.239 ms (0.90% GC)
  mean time:        904.526 ms (0.90% GC)
  maximum time:     917.439 ms (1.73% GC)
  --------------
  samples:          6
  evals/sample:     1

Am I benchmarking this correctly? pushfirst! is even slower than push! + reverse!:


julia> function pr(x, n)
           for i = 1:n
               push!(x, i)
           end
           reverse(x)
       end
pr (generic function with 1 method)
julia> @benchmark pr(x, 100_000_000) setup=(x=Vector{Int}())
BenchmarkTools.Trial: 
  memory estimate:  1.56 GiB
  allocs estimate:  28
  --------------
  minimum time:     1.163 s (0.08% GC)
  median time:      1.193 s (3.46% GC)
  mean time:        1.212 s (4.24% GC)
  maximum time:     1.294 s (10.00% GC)
  --------------
  samples:          5
  evals/sample:     1 

julia> x = Vector{Int}(); y = Vector{Int}()
0-element Array{Int64,1}
julia> pf(x, 1000) == pr(y, 1000)
true

I think that’s traditionally the case in many languages: we optimize for appending, not inserting in front.

A naive implementation of push! (resp. pushfirst!) would re-allocate the array, copy all elements to the bigger buffer, and then add the element to the end (resp. the beginning). That’s insufferably slow because of all the copy operations, and that’s why it’s typically optimized by over-allocating; e.g. by the next power of two, the next Fibonacci number, or something close to either (see e.g. the Python implementation or the Julia implementation). Then the copy happens only every now and then, and on average you get good complexity.

I guess that technically, we could over-allocate the buffer and store the elements in the middle instead of in the end. That would give the same benefit both to loops of push! and loops of pushfirst!. I’m not aware of any language that does that by default – but given your surprise, maybe you are?

2 Likes

That’s exactly what julia does. And that’s why I don’t expect any asymptotic slowdown. The two should just be off by a constant factor with the pushing to the back easier to optimize and more optimized.

And this comparison is unfair. It is very common for some operation to be faster if you can first prepare the problem and transfrom it into a different problem beforee transforming it back. You pay the initial and final cost once and save time elsewhere. Note that you aren’t even reversing twice. The fair comparison is to call reverse! in every iteration and I’ll be surprised if that is faster.

8 Likes

Thank you for correcting me! I didn’t know Julia did that – quite cool.

I think you meant for i = 1:n. But your benchmark timings make it look like that is what you did.

Yes - that was deleted somehow from the code snippet. The function did indeed use 1:n. Edited.