Different ways to initialize an array of specified type

Let’s benchmark the following three:

julia> function a()
           x = Vector{Int}()
           x
       end
a (generic function with 1 method)

julia> function b()
           x::Vector{Int} = []
           x
       end
b (generic function with 1 method)

julia> function c()
           x = Int[]
           x
       end
c (generic function with 1 method)
julia> @benchmark a()
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     16.800 ns (0.00% GC)
  median time:      18.700 ns (0.00% GC)
  mean time:        21.590 ns (7.51% GC)
  maximum time:     1.045 μs (97.54% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark b()
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  2
  --------------
  minimum time:     44.209 ns (0.00% GC)
  median time:      46.727 ns (0.00% GC)
  mean time:        52.021 ns (7.29% GC)
  maximum time:     1.514 μs (94.97% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark c()
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     17.116 ns (0.00% GC)
  median time:      19.019 ns (0.00% GC)
  mean time:        22.369 ns (8.91% GC)
  maximum time:     1.356 μs (97.95% GC)
  --------------
  samples:          10000
  evals/sample:     999

What makes b() so much slower than a()? Are a() and c() equivalent, or the difference of 2-5% really exists?

This is an empty vector of vectors, while Int[] is a vector of integers.

And [] is an empty vector of Any, which you then convert to Int[] in (b), which is why it allocates twice.

4 Likes

In b() you are initializing an array of with eltype Any and then converting it to an array with eltype Int. My guess is that it has to re-allocate memory and make a fresh array. Thus the time is around double the normal times.

Looking at @code_native a() etc. I think a and c are equivalent, whereas b needs to work much more.

1 Like

Actually, a and c do different things (as @mcabbott pointed out): a() creates a vector-of-vectors while c() just creates a vector.

@anon37204545 I suspect you meant to do Vector{Int}() in a() instead of Vector{Int}[].

2 Likes

You’re right, thanks, I overlooked that.

Thanks, fixed. (This doesn’t change the performance.)

So in general, initializing an empty array should be done in the manner of a() or c()?

Yes. But if you are counting the nanoseconds, then you probably don’t want to be creating empty arrays at all. push! is pretty clever but making the array the right size the first time is better. And re-using an array made outside the bit where nanoseconds count is even better.

1 Like

I know setting the right size is also more performant. But does it matter if a vector can have between 3 and 8 elements at the end? In that case, I can’t pre-allocate 8 undef elements to the array.

It could be faster to give it length 8 up front, fill what you fill, and resize! afterwards. Worth a shot anyway.

2 Likes

Or possibly use sizehint! up front, and then just push!.

2 Likes