When it’s possible, I rewrite the calculation of some array from the “first pre-allocation then for-loop” to just a comprehension (albeit fairly complicated sometimes). In my mind comprehensions seem faster because the pre-allocation step is fused with the actual population of the desired array.
Is this generally true or are the differences infinitesimal and I’m (sometimes) sacrificing readability for nothing?
Can you give an example where comprehensions are less readable than creating an array and populating it? I usually find that they are eminently readable, and only deviate from comprehensions when the compiler isn’t able to infer the type for some reason (and I can). But that happens less and less these days.
I had the feeling that comprehensions didn’t know the final size in advance, even in simple ones when there are no filters, hence pre-allocation would always be better. But I also have a doubt whether that is true or not.
I’m not sure what you think it means to “fuse” allocation with assignment of the array elements. Under the hood, it allocates first (if the length is known) and then assigns.
No, if the size can be known in advance (i.e. the non-filter case with an iterator of known length), then it allocates all at once.
Technically, an expression like [x for x in 1:n] lowers to something like collect(Generator(i -> x, 1:n)). Since this Generator object is an iterator with iteratorsize(...) equal to HasShape() (that is, it supports a size function), then collect will allocate the resulting array all at once.
(You can see this if you define f(n) = [x for x in 1:n] and do @code_lowered f(3).)
Also, when you preallocate then the array and its data tend to be stored together. This is nice for the cache predictor if you plan to iterate over the array starting from 1. Also, it will prevent unnecessary moves during resizing and will not over-allocate.
In other words, preallocation is always better, but readability is more important in most usecases.
Possible exception: Pointer-arrays (you store a non-bitstype), a known-size generator might be able to avoid a memset-zero if the length of the generator is known.
Allocation placement:
A = Vector{Int}(1000);
pointer(A)- pointer_from_objref(A)
0x0000000000000040 #reproducible, the struct is exactly one cache-line in front of the data
A = Vector{Int}(); for i in 1:100 push!(A,i); end
pointer(A)- pointer_from_objref(A)
0xffffffffffe0c300 #not reproducible
Regarding my earlier comment about non-preallocation being very bad in the long run:
function prepvec_rnd(n, k)
A = Vector{Vector{Int}}(n)
for i in 1:n
A[i] = Vector{Int}()
end
for j in 1:k
t = rand(1:n)
push!(A[t], j)
end
A
end
function test_speed(A)
s = 0
for i=1:length(A) # in A
vec = A[i]
for j = 1:length(vec)
s+=vec[j]
end
end
s
end
So, we build a vector-of-vectors, where the size of the inner vectors is inhomogeneous and not a priori known.
I’m comparing comprehension vs preallocation, and BenchmarkTools shows that comprehensions allocate 5 times more memory and 2016 times (!!!) more frequently than calling zeros(some_length) and then filling this vector.
Code
using BenchmarkTools
const AV{T} = AbstractVector{T};
function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = [
begin
var = omega + alpha * x^2 + beta * var
var
end
for x in xs
]
@views [var0; out[1:end-1]], var
end;
function fn_mutation(params::AV{T}, xs::AV{<:Real}, var0::Real) where T
omega, alpha, beta = params
var = var0
out = zeros(T, length(xs))
for (t, x) in enumerate(xs)
out[t] = var = omega + alpha * x^2 + beta * var
end
@views [var0; out[1:end-1]], var
end;
xs = randn(1000);
par = randn(3)
@time fn_compr(par, xs, 1.0)
@time fn_mutation(par, xs, 1.0)
@benchmark fn_compr($par, $xs, 1.0)
@benchmark fn_mutation($par, $xs, 1.0)
Benchmark results
In [21]: @benchmark fn_compr($par, $xs, 1.0)
Out[21]: BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 74.080 μs … 2.238 ms ┊ GC (min … max): 0.00% … 94.39%
Time (median): 79.192 μs ┊ GC (median): 0.00%
Time (mean ± σ): 84.108 μs ± 56.673 μs ┊ GC (mean ± σ): 2.60% ± 3.73%
▂▂ ▇█▅▄▆▅▃▂▂▂▁▃▇▆▄▃▃▃▂▁▃▂▁▁ ▁ ▂
███████████████████████████████████▆▆▅▆▇▆▆▆▇▇▇▆▆▅▄▅▂▅▅▄▄▅▄▅ █
74.1 μs Histogram: log(frequency) by time 113 μs <
Memory estimate: 79.25 KiB, allocs estimate: 4032.
In [22]: @benchmark fn_mutation($par, $xs, 1.0)
Out[22]: BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min … max): 4.305 μs … 1.235 ms ┊ GC (min … max): 0.00% … 99.15%
Time (median): 5.442 μs ┊ GC (median): 0.00%
Time (mean ± σ): 7.660 μs ± 46.164 μs ┊ GC (mean ± σ): 25.25% ± 4.19%
▂▄▃▃▄▄▅██▇▆▆▆▃▂▁▁▁ ▁ ▁ ▂
████████████████████████▇▆▅▅▄▆▆▆▆▅▅▆▅▅▅▄▅▅▄▅▆▇▆▇▅▇▆▆▇▆▇▇▇▇ █
4.31 μs Histogram: log(frequency) by time 11.8 μs <
Memory estimate: 15.88 KiB, allocs estimate: 2.
fn_mutation allocates 2 times per call and consumes 15.88 KiB of memory per call.
fn_compr allocates 4032 times (!) per call and consumes 79.25 KiB of memory per call.
for x in xs iterates over a vector, so the size of the resulting vector should be known in advance, but apparently it’s not?
Unfortunately, eltype(xs)[begin stuff end for x in xs] is a syntax error:
In [11]: function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = eltype(xs)[
begin
var = omega + alpha * x^2 + beta * var
var
end
for x in xs
]
@views [var0; out[1:end-1]], var
end;
ERROR: syntax: unexpected "end"
Stacktrace:
[1] top-level scope
@ none:1
[2] top-level scope
@ julia/stdlib/v1.10/REPL/src/REPL.jl:1413
However, eltype(xs)[x for x in xs] works fine:
In [15]: [begin x end for x in xs];
In [16]: eltype(xs)[begin x end for x in xs];
ERROR: syntax: unexpected "end"
Moreover, if I add @info typeof(out) right after the comprehension, it’ll output Vector{Float64}, so it looks like out isn’t a vector of Any.
I must admit that I don’t know why that syntax error occurs, since:
julia> f(x) = eltype(x)[ 1.0*x[i] for i in 1:length(x) ]
f (generic function with 1 method)
julia> f([1,2])
2-element Vector{Int64}:
1
2
julia> f([1.0,2.0])
2-element Vector{Float64}:
1.0
2.0
You can do:
julia> function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = eltype(xs)[
(var = omega + alpha * x^2 + beta * var)
for x in xs
]
@views [var0; out[1:end-1]], var
end;
julia> @btime fn_compr($par, $xs, 1.0);
3.025 μs (2 allocations: 15.88 KiB)
Which solves the syntax error.
The syntax error is related to the begin ... end block associated to the type definition, and is, perhaps, a bug? Not sure.
I think it cannot know at compile time, and that causes the boxing (in some situations it could know at compile time, but that is probably dependent on the inference heuristics). If you provide the type, it is known for sure and will error if the construction fails.