Different ways to initialize an array of specified type

anon37204545 · July 28, 2020, 3:23pm

Let’s benchmark the following three:

julia> function a()
           x = Vector{Int}()
           x
       end
a (generic function with 1 method)

julia> function b()
           x::Vector{Int} = []
           x
       end
b (generic function with 1 method)

julia> function c()
           x = Int[]
           x
       end
c (generic function with 1 method)

julia> @benchmark a()
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     16.800 ns (0.00% GC)
  median time:      18.700 ns (0.00% GC)
  mean time:        21.590 ns (7.51% GC)
  maximum time:     1.045 μs (97.54% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark b()
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  2
  --------------
  minimum time:     44.209 ns (0.00% GC)
  median time:      46.727 ns (0.00% GC)
  mean time:        52.021 ns (7.29% GC)
  maximum time:     1.514 μs (94.97% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark c()
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     17.116 ns (0.00% GC)
  median time:      19.019 ns (0.00% GC)
  mean time:        22.369 ns (8.91% GC)
  maximum time:     1.356 μs (97.95% GC)
  --------------
  samples:          10000
  evals/sample:     999

What makes b() so much slower than a()? Are a() and c() equivalent, or the difference of 2-5% really exists?

mcabbott · July 28, 2020, 3:34pm

This is an empty vector of vectors, while Int[] is a vector of integers.

And [] is an empty vector of Any, which you then convert to Int[] in (b), which is why it allocates twice.

pdeffebach · July 28, 2020, 3:35pm

In b() you are initializing an array of with eltype Any and then converting it to an array with eltype Int. My guess is that it has to re-allocate memory and make a fresh array. Thus the time is around double the normal times.

jbrea · July 28, 2020, 3:46pm

Looking at @code_native a() etc. I think a and c are equivalent, whereas b needs to work much more.

rdeits · July 28, 2020, 3:51pm

Actually, a and c do different things (as @mcabbott pointed out): a() creates a vector-of-vectors while c() just creates a vector.

@anon37204545 I suspect you meant to do Vector{Int}() in a() instead of Vector{Int}[].

jbrea · July 28, 2020, 3:55pm

You’re right, thanks, I overlooked that.

anon37204545 · July 28, 2020, 4:21pm

Thanks, fixed. (This doesn’t change the performance.)

So in general, initializing an empty array should be done in the manner of a() or c()?

mcabbott · July 28, 2020, 4:35pm

Yes. But if you are counting the nanoseconds, then you probably don’t want to be creating empty arrays at all. push! is pretty clever but making the array the right size the first time is better. And re-using an array made outside the bit where nanoseconds count is even better.

anon37204545 · July 28, 2020, 4:39pm

I know setting the right size is also more performant. But does it matter if a vector can have between 3 and 8 elements at the end? In that case, I can’t pre-allocate 8 undef elements to the array.

tbeason · July 28, 2020, 4:48pm

It could be faster to give it length 8 up front, fill what you fill, and resize! afterwards. Worth a shot anyway.

Tamas_Papp · July 29, 2020, 8:09am

Or possibly use sizehint! up front, and then just push!.

Topic		Replies	Views
Why is it slower when you declare an array like this? New to Julia	15	1380	October 9, 2020
Vector of arrays initialization Performance question , arrays	3	521	September 19, 2021
Pre-allocating array efficiency Performance	5	268	October 12, 2024
Performance of resize! vs pre-allocating with zeros(...) Performance	9	1111	April 29, 2022
Define empty array with sizehint in single step General Usage	6	121	November 26, 2024

Different ways to initialize an array of specified type

Related topics