Best Practice for Initializing an Array Prior to for Loop

Hello,

Is there a best practice for initializing arrays prior to for loops / does it matter? (I’ve seen a few similar posts but I don’t think there’s any that explain the differences, or when you might want to use one method or the other.)

E.g. is there a difference between, say:

a = fill(0.0, 10)

and

a = Vector{Float64}(undef, 10)

prior to, something like:

for i in eachindex(a)
a[i] = some calculation
end

Thanks,

Dave

I typically use zeros(dims...), which takes a bit longer than undef, but allows simpler control flow when doing, e.g., a[i] += <some calculation>. For simple calculations, I’ll use array comprehensions:

a = [<some calculation> for i in 1:10]

For more-complicated calculations, you can use map with a do-block:

a = map(1:10) do aᵢ
    <some calculation>
end

Comprehension/map-ing has an additional advantage: it determines the output’s eltype and shape for you, which is great for some of Julia’s messier parametric types - who wants to type Array{MArray{Tuple{3}, Float64, 1, 3}}(undef, 10)?

2 Likes

When using fill you’ll go over the whole array in memory fill that with your value, which takes more time. It doesn’t matter for small arrays but for large ones it can be significant :

julia> @time Vector{Float64}(undef, 100_000_000);
  0.000017 seconds (2 allocations: 762.940 MiB)

julia> @time fill(0.0, 100_000_000);
  0.229996 seconds (2 allocations: 762.940 MiB)

(not sure why the allocation numbers are the same?)

The downside of Vector is that your array will be filled with random numbers (whatever is in memory), so if you use these values by mistakes it can lead to hard to find bugs.

Personally I use Vector mainly when I have a vector of object more complicated that numbers that I don’t wand to initialize yet (e.g. Array{Matrix,1}(undef, 20))

2 Likes

One disadvantage of using fill is that the array is filled with the same object, compared to an comprehension which fills it with distinct objects:

julia> mutable struct example f::Int end

julia> a = fill(example(1), 3)
3-element Array{example,1}:
 example(1)
 example(1)
 example(1)

julia> b = [ example(1) for _ in 1:3 ]
3-element Array{example,1}:
 example(1)
 example(1)
 example(1)

julia> a[1] === a[2]
true

julia> b[1] === b[2]
false

julia> a[1].f = 3
3

julia> a
3-element Array{example,1}:
 example(3)
 example(3)
 example(3)

julia> b[1].f = 3
3

julia> b
3-element Array{example,1}:
 example(3)
 example(1)
 example(1)

This is because the expression in the comprehension is evaluated for each loop iteration, whereas the argument to fill is only evaluated once (before the call).


As for best practice - each version has its (dis)advantages, so whichever version suits the representation of your problem best. I usually go with comprehensions if the initialization is a long piece of code (which I then put into its own function). When all the function does is initializer an array, I usually either go with the undef version if the initialitization is a little bit more complicated and with zeros if I was going to start with zeroing the memory anyway.

3 Likes

The allocation numbers are the same because in both cases, only the backing memory is allocated. Each Float64 is 8 byte and 800_000_000 byte === 762.940 MiB.

3 Likes

Thanks for the explanation.

I tried your code on my machine using julia 1.5.2, the following result is interesting.

Why initiating the array using undef is much slower than using Float64?

julia> @time Vector(undef,10^8);
  0.366178 seconds (2 allocations: 762.940 MiB, 11.75% gc time)

julia> @time Vector{Float64}(undef,10^8);
  0.072459 seconds (2 allocations: 762.940 MiB, 97.68% gc time)

julia> @time fill(0.0,10^8);
  0.481662 seconds (2 allocations: 762.940 MiB, 14.65% gc time)

this is an Array{Any,1}, with elements undefined.

1 Like