To assess overall performance impact, let’s measure the entire pascal function with different initializations. Here’s a version of pascal that lets you provide an initialization function:
function pascal(c::Function, ::Type{T}, n::Integer) where {T<:Real}
A = c(T, n) :: Matrix{T}
A[:, 1] .= one(T)
A[1, :] .= one(T)
for j = 2:n, i = 2:n
A[i, j] = A[i-1, j] + A[i, j-1]
end
return A
end
Here’s a benchmarking function for getting the relative slowdown of initializing with a shared zero value and with a copied zero value:
using BenchmarkTools
function brel(::Type{T}, n::Integer) where {T<:Real}
tu = @belapsed(pascal((T,n) -> Matrix{T}(undef, n, n), $T, $n))
tz = @belapsed(pascal((T,n) -> zeros(T, n, n), $T, $n))
tc = @belapsed(pascal((T,n) -> [zero(T) for i=1:n, j=1:n], $T, $n))
tz/tu, tc/tu
end
Here are the results at various sizes for Int:
julia> brel(Int, 3)
(1.0511476966109345, 7.3473435575480694)
julia> brel(Int, 10)
(1.116188052967733, 37.760377706939884)
julia> brel(Int, 100)
(1.0082644628099173, 35.63914049586777)
julia> brel(Int, 1000)
(1.0412897581039173, 30.49603364426373)
Here are the results at various sizes for BigInt:
julia> brel(BigInt, 3)
(1.062226254042928, 4.883999755814663)
julia> brel(BigInt, 10)
(1.0436965517241379, 5.586206896551724)
julia> brel(BigInt, 100)
(1.0327586457883473, 4.387387605964587)
julia> brel(BigInt, 1000)
(1.0011408621658366, 2.7787992640152344)
The huge difference between zeros and comprehensions for Int seems like a matter of insufficient optimization: Int is a value type, so the comprehension version should, in theory be possible to do as efficiently as zeros. However, that is a tricky optimization since you probably have to recognize that you can just fill the memory with a single bit pattern. Could be done, but not so easy. In any case, here we see that initialization does have a significant impact on overall performance.
Luckily, I think, initialization is usually not the dominant cost in a computation - and if it is, then that’s a pretty good sign you are using a dense structure when you should be using a sparse.
I don’t buy this sparsity argument. Ultimately, whether a data structure is dense or sparse, there’s a plain old dense array at the bottom and it has to be allocated and if we force it to also have an expensive initialization step for memory that’s just going to be overwritten, then that’s going to be wasted time and make things slower. Whether the structure is dense or sparse is not really relevant. Perhaps you’re thinking about sparse data structures starting out empty and being grown incrementally; for example, doing this with sparse matrices:
using SparseArrays
n = 10^6
I = rand(1:n, 10^4)
J = rand(1:n, 10^4)
@time let
S = spzeros(BigInt, n, n)
for (i, j) = zip(I, J)
S[i, j] = 1
end
end
This does indeed start out with empty internal data structures and expand them one element at a time, growing them as necessary, which avoids the question of what value to initialize the internal arrays with since you have a value by the time you have to grow an array. However, this is also a notoriously slow way to initialize a sparse matrix—on my system, this code takes 0.84 seconds. If we instead use the sparse constructor, it’s much faster—doing sparse(I, J, big(1), n, n) takes only 3.8 milliseconds, 222x faster than incremental construction. Why is this version faster? Because it can pre-allocate the internal arrays with the right size instead of growing them incrementally. And it’s not just a small improvement—it’s two orders of magnitude.
So even for sparse data structures, we want to allocate (uninitialized) arrays as much as possible up front and then fill them with the right values, which gets slower if we have to initialize the arrays with a default value before writing the final values. How much worse is this? Fortunately, this is easy enough to measure in Julia by patching the core sparse method, which I do in this benchmark script. Results:
# Initialization: none
BenchmarkTools.Trial: 79 samples with 1 evaluation.
Range (min … max): 60.279 ms … 71.610 ms ┊ GC (min … max): 0.81% … 4.61%
Time (median): 63.709 ms ┊ GC (median): 5.12%
Time (mean ± σ): 63.852 ms ± 1.853 ms ┊ GC (mean ± σ): 5.07% ± 0.51%
▁▃▆▁ ▁ ▃▁▆█ ▁
▄▁▁▁▁▁▄▁▇▇▇▄████▄█▇▇▇▄████▇▇▇▄▄▄▄█▇▇▄▁▁▄▁▇▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▇ ▁
60.3 ms Histogram: frequency by time 69.1 ms <
Memory estimate: 232.70 MiB, allocs estimate: 22.
# Initialization: zeros
BenchmarkTools.Trial: 75 samples with 1 evaluation.
Range (min … max): 64.880 ms … 71.832 ms ┊ GC (min … max): 5.18% … 4.67%
Time (median): 67.445 ms ┊ GC (median): 5.07%
Time (mean ± σ): 67.511 ms ± 1.447 ms ┊ GC (mean ± σ): 5.08% ± 0.52%
▂ ▂ ▅▂ ▂ ▂ █ ▂
█▁▁▁██▅▅▅▁▁▁▅█▅██▁▅▁▅██▅█▅█▅█▅▅██▁▅████▅▁▁▅▅▅██▁█▁▁▁▁█▁▅▅▅▅ ▁
64.9 ms Histogram: frequency by time 70.2 ms <
Memory estimate: 232.70 MiB, allocs estimate: 24.
# Initialization: comprehension
BenchmarkTools.Trial: 68 samples with 1 evaluation.
Range (min … max): 66.618 ms … 88.653 ms ┊ GC (min … max): 4.71% … 8.73%
Time (median): 73.164 ms ┊ GC (median): 10.21%
Time (mean ± σ): 73.644 ms ± 3.148 ms ┊ GC (mean ± σ): 10.12% ± 0.73%
▆▆▂ ▂▄ ██ ▄
▄▁▁▁▁▁▁▁▁▁▁▁▁▄█████████▆██▁▆▆▁▄▁▁▄▁▁▁▄▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▄ ▁
66.6 ms Histogram: frequency by time 85 ms <
Memory estimate: 236.51 MiB, allocs estimate: 200022.
If we compare minima, then array initialization with zeros is 7% slower and with comprehensions is 10% slower. If we compare medians or means, then zeros is 5% slower and comprehension is 14% slower. Either way, those are not insignificant performance hits. I think this shows that even using sparse data structures you don’t escape from the initialization issue.