# Comprehensions versus pre-allocation

When it’s possible, I rewrite the calculation of some array from the “first pre-allocation then for-loop” to just a comprehension (albeit fairly complicated sometimes). In my mind comprehensions seem faster because the pre-allocation step is fused with the actual population of the desired array.

Is this generally true or are the differences infinitesimal and I’m (sometimes) sacrificing readability for nothing?

Can you give an example where comprehensions are less readable than creating an array and populating it? I usually find that they are eminently readable, and only deviate from comprehensions when the compiler isn’t able to infer the type for some reason (and I can). But that happens less and less these days.

1 Like
``````ps = [Vector{Pair{Int, CartesianIndex{2}}}([1 => i]) for i in CartesianRange(size(I1)) if I1[i] == I1max[i] && good_parameters(get_parameters(i, I))]
``````

I would use more line breaks, but each to his own

It may be missing something, but I don’t see how creating and populating an array would be simpler for this example.

Yea, I agree that comprehensions are usually prettier and more readable. But beauty aside, are comprehension also faster?

1 Like

I had the feeling that comprehensions didn’t know the final size in advance, even in simple ones when there are no filters, hence pre-allocation would always be better. But I also have a doubt whether that is true or not.

No.

I’m not sure what you think it means to “fuse” allocation with assignment of the array elements. Under the hood, it allocates first (if the length is known) and then assigns.

No, if the size can be known in advance (i.e. the non-filter case with an iterator of known length), then it allocates all at once.

Technically, an expression like `[x for x in 1:n]` lowers to something like `collect(Generator(i -> x, 1:n))`. Since this `Generator` object is an iterator with `iteratorsize(...)` equal to `HasShape()` (that is, it supports a `size` function), then `collect` will allocate the resulting array all at once.

(You can see this if you define `f(n) = [x for x in 1:n]` and do `@code_lowered f(3)`.)

9 Likes

Also, when you preallocate then the array and its data tend to be stored together. This is nice for the cache predictor if you plan to iterate over the array starting from 1. Also, it will prevent unnecessary moves during resizing and will not over-allocate.

In other words, preallocation is always better, but readability is more important in most usecases.
Possible exception: Pointer-arrays (you store a non-bitstype), a known-size generator might be able to avoid a memset-zero if the length of the generator is known.

Allocation placement:

``````A = Vector{Int}(1000);
pointer(A)- pointer_from_objref(A)
0x0000000000000040 #reproducible, the struct is exactly one cache-line in front of the data

A = Vector{Int}(); for i in 1:100 push!(A,i); end
pointer(A)- pointer_from_objref(A)
0xffffffffffe0c300 #not reproducible
``````

As I explained above, in cases where the size of the iterator is known, comprehensions pre-allocate too.

Regarding my earlier comment about non-preallocation being very bad in the long run:

``````function prepvec_rnd(n, k)
A = Vector{Vector{Int}}(n)
for i in 1:n
A[i] = Vector{Int}()
end
for j in 1:k
t = rand(1:n)
push!(A[t], j)
end
A
end

function test_speed(A)
s = 0
for i=1:length(A) # in A
vec = A[i]
for j = 1:length(vec)
s+=vec[j]
end
end
s
end
``````

So, we build a vector-of-vectors, where the size of the inner vectors is inhomogeneous and not a priori known.

``````A = prepvec_rnd(10_0000, 100_0000); AA = deepcopy(A);
test_speed(A); @time test_speed(A);
0.006608 seconds (5 allocations: 176 bytes)
test_speed(AA); @time test_speed(AA);
0.002869 seconds (5 allocations: 176 bytes)
``````

This difference is large enough to often make deepcopies worthwhile, if you amortize over a lot of reads.

Is this still the case in modern (1.9.1) Julia?

I’m comparing comprehension vs preallocation, and `BenchmarkTools` shows that comprehensions allocate 5 times more memory and 2016 times (!!!) more frequently than calling `zeros(some_length)` and then filling this vector.

## Code

``````using BenchmarkTools
const AV{T} = AbstractVector{T};

function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = [
begin
var = omega + alpha * x^2 + beta * var
var
end
for x in xs
]
@views [var0; out[1:end-1]], var
end;

function fn_mutation(params::AV{T}, xs::AV{<:Real}, var0::Real) where T
omega, alpha, beta = params
var = var0
out = zeros(T, length(xs))
for (t, x) in enumerate(xs)
out[t] = var = omega + alpha * x^2 + beta * var
end
@views [var0; out[1:end-1]], var
end;

xs = randn(1000);
par = randn(3)

@time fn_compr(par, xs, 1.0)
@time fn_mutation(par, xs, 1.0)

@benchmark fn_compr(\$par, \$xs, 1.0)
@benchmark fn_mutation(\$par, \$xs, 1.0)
``````

## Benchmark results

``````In [21]: @benchmark fn_compr(\$par, \$xs, 1.0)
Out[21]: BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max):  74.080 μs …  2.238 ms  ┊ GC (min … max): 0.00% … 94.39%
Time  (median):     79.192 μs              ┊ GC (median):    0.00%
Time  (mean ± σ):   84.108 μs ± 56.673 μs  ┊ GC (mean ± σ):  2.60% ±  3.73%

▂▂ ▇█▅▄▆▅▃▂▂▂▁▃▇▆▄▃▃▃▂▁▃▂▁▁   ▁                             ▂
███████████████████████████████████▆▆▅▆▇▆▆▆▇▇▇▆▆▅▄▅▂▅▅▄▄▅▄▅ █
74.1 μs      Histogram: log(frequency) by time       113 μs <

Memory estimate: 79.25 KiB, allocs estimate: 4032.

In [22]: @benchmark fn_mutation(\$par, \$xs, 1.0)
Out[22]: BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min … max):  4.305 μs …  1.235 ms  ┊ GC (min … max):  0.00% … 99.15%
Time  (median):     5.442 μs              ┊ GC (median):     0.00%
Time  (mean ± σ):   7.660 μs ± 46.164 μs  ┊ GC (mean ± σ):  25.25% ±  4.19%

▂▄▃▃▄▄▅██▇▆▆▆▃▂▁▁▁ ▁ ▁                                     ▂
████████████████████████▇▆▅▅▄▆▆▆▆▅▅▆▅▅▅▄▅▅▄▅▆▇▆▇▅▇▆▆▇▆▇▇▇▇ █
4.31 μs      Histogram: log(frequency) by time     11.8 μs <

Memory estimate: 15.88 KiB, allocs estimate: 2.
``````
• `fn_mutation` allocates 2 times per call and consumes 15.88 KiB of memory per call.
• `fn_compr` allocates 4032 times (!) per call and consumes 79.25 KiB of memory per call.

`for x in xs` iterates over a vector, so the size of the resulting vector should be known in advance, but apparently it’s not?

This initializes an array of type `Any`. You have to use for example `typeof(xs)[...` such that the comprehension has a concrete element type.

1 Like

Unfortunately, `eltype(xs)[begin stuff end for x in xs]` is a syntax error:

``````In [11]: function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = eltype(xs)[
begin
var = omega + alpha * x^2 + beta * var
var
end
for x in xs
]
@views [var0; out[1:end-1]], var
end;
ERROR: syntax: unexpected "end"
Stacktrace:
[1] top-level scope
@ none:1
[2] top-level scope
@ julia/stdlib/v1.10/REPL/src/REPL.jl:1413
``````

However, `eltype(xs)[x for x in xs]` works fine:

``````In [15]: [begin x end for x in xs];

In [16]: eltype(xs)[begin x end for x in xs];
ERROR: syntax: unexpected "end"
``````

Moreover, if I add `@info typeof(out)` right after the comprehension, it’ll output `Vector{Float64}`, so it looks like `out` isn’t a vector of `Any`.

I must admit that I don’t know why that syntax error occurs, since:

``````julia> f(x) = eltype(x)[ 1.0*x[i] for i in 1:length(x) ]
f (generic function with 1 method)

julia> f([1,2])
2-element Vector{Int64}:
1
2

julia> f([1.0,2.0])
2-element Vector{Float64}:
1.0
2.0
``````

You can do:

``````
julia> function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
omega, alpha, beta = params
var = var0
out = eltype(xs)[
(var = omega + alpha * x^2 + beta * var)
for x in xs
]
@views [var0; out[1:end-1]], var
end;

julia> @btime fn_compr(\$par, \$xs, 1.0);
3.025 μs (2 allocations: 15.88 KiB)
``````

Which solves the syntax error.

The syntax error is related to the `begin ... end` block associated to the type definition, and is, perhaps, a bug? Not sure.

I think it cannot know at compile time, and that causes the boxing (in some situations it could know at compile time, but that is probably dependent on the inference heuristics). If you provide the type, it is known for sure and will error if the construction fails.

Amazing! This does completely remove all unnecessary allocations, indeed. Thank you!

However, it seems like the syntax error is a bug:

1. `expr1[expr2 for expr3 in expr4]` is valid syntax
2. `[expr2 for expr3 in expr4]` i valid syntax
3. `[begin expr2 end for expr3 in expr4]` is valid syntax
4. For consistency, `expr1[begin expr2 end for expr3 in expr4]` should be valid syntax as well, but it’s not…

BTW, `NamedTuple`-like syntax works fine with multiple statements/expressions as well:

``````P[
(
var = omega + alpha * x^2 + beta * var;
var
)
for x in xs
]
``````

I was under the impression that `begin x; y end` and `(x; y)` were syntactically equivalent, but apparently not.

1 Like

(we’ll see if that’s something known already)

1 Like