I am asking this question in an effort to improve my understanding of computer science, not to complain about Juliaβs performance
I understand that, in principle, if we know that we are going to be filling an vector with exactly 1000 entries, it is better to simply declare it as an a 1000-element vector at the outset than to start with []
and push.
I also understand that if we know that our vector will have approximately/at least 1000 entries, but arenβt certain, then we can improve the performance of the push operations by using sizehint!()
. In practice, it seems like this is mainly used for dictionaries, heaps, and other higher-level data structures rather than vectors of numbers.
My mental model of how sizehint!()
works is as follows:
-
If I call
a = zeros(Float64, 1000)
, Julia allocates 1000 blocks of memory, fills them with zeros, and then βlabelsβ them with the variable namea
and a header that says βa
is a 1000-element vector whose entries are stored over here.β -
If I call
a = Float64[]; sizehint!(a, 1000)
, Julia allocates 1000 blocks of memory, builds a βfenceβ around them that says that other variables shouldnβt have their values stored here, and then βlabelsβ this with the variable namea
and a header that says βa
is a 0-element vector whose entries are stored over here.βNow if I call
push!(a, 0.0)
1000 times, the memory needed for the push is already βright next door,β so without iterating over the whole ofa
, Julia can simply increment its size and write the value of 0.5 to the end block in O(1)-time.
If my mental model is correct, then the function bar()
below should be faster than the function foo()
, because while both functions allocate m
Float64
-sized blocks, the first one needlessly fills them with zeros before overwriting them to be 0.5
, whereas the second does not. The only additional computation required by bar()
(in my mental model) is incrementing the size of a
by 1 at each i
, but this cannot be substantially more expensive than writing a bunch of zeros to the memory.
julia> function foo(m)
a = zeros(Float64, m)
for i in 1:m
a[i] = 0.5
end
end
foo (generic function with 1 method)
julia> function bar(m)
a = Float64[]
sizehint!(a, m)
for i in 1:m
push!(a, 0.5)
end
end
bar (generic function with 1 method)
But the opposite is the case: bar()
is nearly 4.5 times faster.
julia> using BenchmarkTools
julia> @benchmark foo(50)
BenchmarkTools.Trial: 10000 samples with 972 evaluations.
Range (min β¦ max): 66.278 ns β¦ 567.468 ns β GC (min β¦ max): 0.00% β¦ 84.62%
Time (median): 83.435 ns β GC (median): 0.00%
Time (mean Β± Ο): 90.296 ns Β± 45.138 ns β GC (mean Β± Ο): 4.83% Β± 8.40%
ββββββ β
β
ββββββββββββββββββββββ
β
β
ββ
βββββββββββββββββββββββββββββββββ
β
66.3 ns Histogram: log(frequency) by time 471 ns <
Memory estimate: 496 bytes, allocs estimate: 1.
julia> @benchmark bar(50)
BenchmarkTools.Trial: 10000 samples with 264 evaluations.
Range (min β¦ max): 294.227 ns β¦ 5.683 ΞΌs β GC (min β¦ max): 0.00% β¦ 94.45%
Time (median): 300.496 ns β GC (median): 0.00%
Time (mean Β± Ο): 325.789 ns Β± 255.622 ns β GC (mean Β± Ο): 4.16% Β± 5.02%
βββββ ββ β
βββββββ
ββ
ββββββββββ
β
ββββββββ
βββ
β
ββ
ββ
ββ
ββββββ
ββ
βββββ
β
ββ
ββ
βββββ β
294 ns Histogram: log(frequency) by time 495 ns <
Memory estimate: 512 bytes, allocs estimate: 2.
Whatβs going on here?
(The results are similar if you write rand()
instead of 0.5
.)