Why is it slower when you declare an array like this?

I’ve read that when I declare an array like this:

a = [1,2,3,4,5]

it can be slower, but why? Is it because Julia has to determine the type?

Can it be slower when you don’t provide the dimensions of an array? For example, is this:

a = Array{Int32}(undef, 5)

faster then this:

b = Int32[]

It’s not just defining the array, it’s also filling in values.

2 Likes

Where did you read that? Slower than what? 1 + 1, yes, sure, sleep(1), hopefully not… Something else? well that entirely depend on what the other thing is doing.

This is almost never part of the runtime cost. At least not for any simple and builtin/base-provided operations like this.

What do you mean by “providing the dimensions of an array”? In none of the 3 cases do you provide the dimension (1) as a number in the code, though in all 3 cases the dimension is determined before runtime.

Well, one allocate a bigger array then the other so the bigger one should not be faster.

5 Likes

Int[] is basically the same as Array{Int}(undef, 0), i.e. a Vector of Ints of size 0. Since this vector needs less memory than Array{Int}(undef, 5), evaluating Int[] cannot be slower than evaluating Array{Int}(undef, 5).

Evaluating [1, 2, 3, 4, 5] is one step further than just evaluating Array{Int32}(undef, 5): it also initializes the array elements. Therefore, evaluating [1, 2, 3, 4, 5] cannot be faster than evaluating Array{Int}(undef, 5).

You can verify this yourself:

julia> using BenchmarkTools

julia> @btime Int[]
  17.100 ns (1 allocation: 80 bytes)
Int64[]

julia> @btime Array{Int}(undef, 5)
  18.218 ns (1 allocation: 128 bytes)
5-element Array{Int64,1}:
 341787760
 341783024
 341755024
 177957920
      4934

julia> @btime [1, 2, 3, 4, 5]
  19.959 ns (1 allocation: 128 bytes)
5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Note that the allocated memory changes from Int[] to Array{Int}(undef, 5), but it doesn’t from Array{Int}(undef, 5) to [1, 2, 3, 4, 5]. Also, note that it may be the case that Int === Int32 on your computer, but this isn’t always true. As you can see, Int === Int64 on my machine.

If you write type-stable code, you usually don’t have to worry about type inference, because it occurs during compilation (type inference generally occurs at compile-time, not run-time).

In your examples, all arrays are one-dimensional (they are vectors). If you are talking about the number of elements, you are specifying the number of elements in each array (Julia can figure out that [1, 2, 3, 4, 5] has five elements), so this isn’t a bottleneck in most of the cases.

If what you want to do is to create an ordinary vector filled with specific values, then writing [1,2,3,4,5] is as fast as it gets.

If what you want to do is to create a vector of a specific size, but don’t care about the values it contains, then Vector{Int}(undef, 5) is as fast as it gets.

If what you want to do is to create an empty vector, then Int[] is as fast as it gets, same as Vector{Int}(undef,0) or Vector{Int}().

1 Like

Here is where I was reading:

Initialize an Empty Array:

Arrays of type Any will generally not perform as well as those with a specified type.

From Stack Overflow:

The second option which explicitly defines type should generate faster code.

I’m not saying they’re right. I still have a lot to learn myself. Are they just making general overall statements?

Any[] is different from Int[] but [1, 2, 3] is not different from Int[1,2,3]. If you have an empty array you need to give the element type because there is no values to infer the type from and Julia will default to Any.

2 Likes

I don’t quite understand the connection with your original question here. [1,2,3,4,5] does not create an array of type Any, but creates a correctly typed array:

julia> typeof([1,2,3,4,5])
Array{Int64,1}

I was reading from one of the links I posted and got confused about what it was saying when you create an array. I assumed incorrectly that when no type is mentioned, even with values during initialization, it was still slower. That’s why I asked.

To some extent is that link right then? When no type or values are provided and it defaults to Any, it’s slower?

Yes, an Any[] array tends to be slower to compute with than an array with a concrete element type, like Int[].

7 Likes

But note that this isn’t about the construction of the array object, but about what happens with it afterwards. A Vector{Any} is like a normal Python array that can have any kind of object in it; a Vector{Int} is like a NumPy typed array that can only have native integers in it. Code working with the latter will be much faster than code that operates on the former.

9 Likes

If I could combine this answer along with kristoffer.carlsson’s answer, and select it, I would. :grinning:

1 Like

Thanks. I think I have enough internet points to let Kristoffer have this one :yum:

6 Likes

I think it always helps to benchmark things when you have questions.
Creating is about the same:

julia> @benchmark Int[]
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     19.459 ns (0.00% GC)
  median time:      22.452 ns (0.00% GC)
  mean time:        28.540 ns (17.54% GC)
  maximum time:     3.399 μs (99.33% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark Any[]
BenchmarkTools.Trial:
  memory estimate:  80 bytes
  allocs estimate:  1
  --------------
  minimum time:     17.761 ns (0.00% GC)
  median time:      19.854 ns (0.00% GC)
  mean time:        26.517 ns (20.25% GC)
  maximum time:     3.499 μs (98.78% GC)
  --------------
  samples:          10000
  evals/sample:     997

But we see a big difference in doing some basic operations with these arrays:

julia> xint = [1]; typeof(xint)
Array{Int64,1}

julia> xany = Any[1]; typeof(xint)
Array{Int64,1}

julia> @benchmark $xint[1] *= 2
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.268 ns (0.00% GC)
  median time:      1.278 ns (0.00% GC)
  mean time:        1.289 ns (0.00% GC)
  maximum time:     11.207 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark $xany[1] *= 2
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.306 ns (0.00% GC)
  median time:      17.058 ns (0.00% GC)
  mean time:        16.972 ns (0.00% GC)
  maximum time:     36.068 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> x = randn(256);

julia> y = Any[xᵢ for xᵢ ∈ x];

julia> @btime sum($x)
  37.373 ns (0 allocations: 0 bytes)
0.3229189641652014

julia> @btime sum($y)
  3.142 μs (255 allocations: 3.98 KiB)
0.3229189641652014

julia> 3.142 / 37.373e-3
84.07138843550156
4 Likes

I’d like to point out a potential confusion in the question and solution as formatted by discourse. In the first post, it says “Is this [an array of Int32] faster than this [an empty array of Int32]” followed by the “Solved by…Yes, an Any array tends to be slower…”

So it looks like a Yes/No question is directly answered by a Yes, except @kristoffer.carlsson’s Yes was for a different question, and the direct answer is No (as pointed out by others).

Nobody did anything wrong, but I fear anyone who sees the top capsule solution will be led quite astray unless they read the entire thread. Not sure if there is a solution, but I think it’s unfortunate.

1 Like