Comprehensions versus pre-allocation

yakir12 · November 28, 2017, 8:48am

When it’s possible, I rewrite the calculation of some array from the “first pre-allocation then for-loop” to just a comprehension (albeit fairly complicated sometimes). In my mind comprehensions seem faster because the pre-allocation step is fused with the actual population of the desired array.

Is this generally true or are the differences infinitesimal and I’m (sometimes) sacrificing readability for nothing?

Tamas_Papp · November 28, 2017, 8:52am

Can you give an example where comprehensions are less readable than creating an array and populating it? I usually find that they are eminently readable, and only deviate from comprehensions when the compiler isn’t able to infer the type for some reason (and I can). But that happens less and less these days.

yakir12 · November 28, 2017, 8:54am

ps = [Vector{Pair{Int, CartesianIndex{2}}}([1 => i]) for i in CartesianRange(size(I1)) if I1[i] == I1max[i] && good_parameters(get_parameters(i, I))]

Tamas_Papp · November 28, 2017, 8:59am

I would use more line breaks, but each to his own

It may be missing something, but I don’t see how creating and populating an array would be simpler for this example.

yakir12 · November 28, 2017, 9:03am

Yea, I agree that comprehensions are usually prettier and more readable. But beauty aside, are comprehension also faster?

cdsousa · November 28, 2017, 12:09pm

I had the feeling that comprehensions didn’t know the final size in advance, even in simple ones when there are no filters, hence pre-allocation would always be better. But I also have a doubt whether that is true or not.

stevengj · November 28, 2017, 1:02pm

No.

I’m not sure what you think it means to “fuse” allocation with assignment of the array elements. Under the hood, it allocates first (if the length is known) and then assigns.

No, if the size can be known in advance (i.e. the non-filter case with an iterator of known length), then it allocates all at once.

Technically, an expression like [x for x in 1:n] lowers to something like collect(Generator(i -> x, 1:n)). Since this Generator object is an iterator with iteratorsize(...) equal to HasShape() (that is, it supports a size function), then collect will allocate the resulting array all at once.

(You can see this if you define f(n) = [x for x in 1:n] and do @code_lowered f(3).)

foobar_lv2 · November 29, 2017, 12:04am

Also, when you preallocate then the array and its data tend to be stored together. This is nice for the cache predictor if you plan to iterate over the array starting from 1. Also, it will prevent unnecessary moves during resizing and will not over-allocate.

In other words, preallocation is always better, but readability is more important in most usecases.
Possible exception: Pointer-arrays (you store a non-bitstype), a known-size generator might be able to avoid a memset-zero if the length of the generator is known.

Allocation placement:

A = Vector{Int}(1000);
pointer(A)- pointer_from_objref(A)
0x0000000000000040 #reproducible, the struct is exactly one cache-line in front of the data

A = Vector{Int}(); for i in 1:100 push!(A,i); end
pointer(A)- pointer_from_objref(A)
0xffffffffffe0c300 #not reproducible

stevengj · November 29, 2017, 4:09am

As I explained above, in cases where the size of the iterator is known, comprehensions pre-allocate too.

foobar_lv2 · December 1, 2017, 2:37pm

Regarding my earlier comment about non-preallocation being very bad in the long run:

function prepvec_rnd(n, k)
    A = Vector{Vector{Int}}(n)
    for i in 1:n
        A[i] = Vector{Int}()
    end
    for j in 1:k
        t = rand(1:n)
        push!(A[t], j)
    end
    A
end

function test_speed(A)
    s = 0
    for i=1:length(A) # in A
        vec = A[i]
        for j = 1:length(vec)
            s+=vec[j]
        end
    end
    s
end

So, we build a vector-of-vectors, where the size of the inner vectors is inhomogeneous and not a priori known.

A = prepvec_rnd(10_0000, 100_0000); AA = deepcopy(A);
test_speed(A); @time test_speed(A);
  0.006608 seconds (5 allocations: 176 bytes)
test_speed(AA); @time test_speed(AA);
  0.002869 seconds (5 allocations: 176 bytes)

This difference is large enough to often make deepcopies worthwhile, if you amortize over a lot of reads.

ForceBru · June 22, 2023, 3:29pm

Is this still the case in modern (1.9.1) Julia?

I’m comparing comprehension vs preallocation, and BenchmarkTools shows that comprehensions allocate 5 times more memory and 2016 times (!!!) more frequently than calling zeros(some_length) and then filling this vector.

Code

using BenchmarkTools
const AV{T} = AbstractVector{T};

function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
    omega, alpha, beta = params
    var = var0
    out = [
        begin
            var = omega + alpha * x^2 + beta * var
            var
        end
        for x in xs
    ]
    @views [var0; out[1:end-1]], var
end;

function fn_mutation(params::AV{T}, xs::AV{<:Real}, var0::Real) where T
    omega, alpha, beta = params
    var = var0
    out = zeros(T, length(xs))
    for (t, x) in enumerate(xs)
        out[t] = var = omega + alpha * x^2 + beta * var
    end
    @views [var0; out[1:end-1]], var
end;

xs = randn(1000);
par = randn(3)

@time fn_compr(par, xs, 1.0)
@time fn_mutation(par, xs, 1.0)

@benchmark fn_compr($par, $xs, 1.0)
@benchmark fn_mutation($par, $xs, 1.0)

Benchmark results

In [21]: @benchmark fn_compr($par, $xs, 1.0)
Out[21]: BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  74.080 μs …  2.238 ms  ┊ GC (min … max): 0.00% … 94.39%
 Time  (median):     79.192 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.108 μs ± 56.673 μs  ┊ GC (mean ± σ):  2.60% ±  3.73%

  ▂▂ ▇█▅▄▆▅▃▂▂▂▁▃▇▆▄▃▃▃▂▁▃▂▁▁   ▁                             ▂
  ███████████████████████████████████▆▆▅▆▇▆▆▆▇▇▇▆▆▅▄▅▂▅▅▄▄▅▄▅ █
  74.1 μs      Histogram: log(frequency) by time       113 μs <

 Memory estimate: 79.25 KiB, allocs estimate: 4032.

In [22]: @benchmark fn_mutation($par, $xs, 1.0)
Out[22]: BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.305 μs …  1.235 ms  ┊ GC (min … max):  0.00% … 99.15%
 Time  (median):     5.442 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   7.660 μs ± 46.164 μs  ┊ GC (mean ± σ):  25.25% ±  4.19%

  ▂▄▃▃▄▄▅██▇▆▆▆▃▂▁▁▁ ▁ ▁                                     ▂
  ████████████████████████▇▆▅▅▄▆▆▆▆▅▅▆▅▅▅▄▅▅▄▅▆▇▆▇▅▇▆▆▇▆▇▇▇▇ █
  4.31 μs      Histogram: log(frequency) by time     11.8 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

fn_mutation allocates 2 times per call and consumes 15.88 KiB of memory per call.
fn_compr allocates 4032 times (!) per call and consumes 79.25 KiB of memory per call.

for x in xs iterates over a vector, so the size of the resulting vector should be known in advance, but apparently it’s not?

lmiq · June 22, 2023, 3:59pm

This initializes an array of type Any. You have to use for example typeof(xs)[... such that the comprehension has a concrete element type.

ForceBru · June 22, 2023, 4:04pm

Unfortunately, eltype(xs)[begin stuff end for x in xs] is a syntax error:

In [11]: function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
             omega, alpha, beta = params
             var = var0
             out = eltype(xs)[
                 begin
                     var = omega + alpha * x^2 + beta * var
                     var
                 end
                 for x in xs
             ]
             @views [var0; out[1:end-1]], var
         end;
ERROR: syntax: unexpected "end"
Stacktrace:
 [1] top-level scope
   @ none:1
 [2] top-level scope
   @ julia/stdlib/v1.10/REPL/src/REPL.jl:1413

However, eltype(xs)[x for x in xs] works fine:

In [15]: [begin x end for x in xs];

In [16]: eltype(xs)[begin x end for x in xs];
ERROR: syntax: unexpected "end"

Moreover, if I add @info typeof(out) right after the comprehension, it’ll output Vector{Float64}, so it looks like out isn’t a vector of Any.

lmiq · June 22, 2023, 4:09pm

I must admit that I don’t know why that syntax error occurs, since:

julia> f(x) = eltype(x)[ 1.0*x[i] for i in 1:length(x) ]
f (generic function with 1 method)

julia> f([1,2])
2-element Vector{Int64}:
 1
 2

julia> f([1.0,2.0])
2-element Vector{Float64}:
 1.0
 2.0

You can do:


julia> function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
           omega, alpha, beta = params
           var = var0
           out = eltype(xs)[
               (var = omega + alpha * x^2 + beta * var)
               for x in xs
           ]
           @views [var0; out[1:end-1]], var
       end;

julia> @btime fn_compr($par, $xs, 1.0);
  3.025 μs (2 allocations: 15.88 KiB)

Which solves the syntax error.

The syntax error is related to the begin ... end block associated to the type definition, and is, perhaps, a bug? Not sure.

lmiq · June 22, 2023, 4:12pm

I think it cannot know at compile time, and that causes the boxing (in some situations it could know at compile time, but that is probably dependent on the inference heuristics). If you provide the type, it is known for sure and will error if the construction fails.

ForceBru · June 22, 2023, 4:17pm

Amazing! This does completely remove all unnecessary allocations, indeed. Thank you!

However, it seems like the syntax error is a bug:

expr1[expr2 for expr3 in expr4] is valid syntax
[expr2 for expr3 in expr4] i valid syntax
[begin expr2 end for expr3 in expr4] is valid syntax
For consistency, expr1[begin expr2 end for expr3 in expr4] should be valid syntax as well, but it’s not…

BTW, NamedTuple-like syntax works fine with multiple statements/expressions as well:

P[
    (
        var = omega + alpha * x^2 + beta * var;
        var
    )
    for x in xs
]

I was under the impression that begin x; y end and (x; y) were syntactically equivalent, but apparently not.

lmiq · June 22, 2023, 4:20pm

Bug filled: Syntax error with comprehension with begin ... end block with annotated type · Issue #50265 · JuliaLang/julia · GitHub

(we’ll see if that’s something known already)

Topic		Replies	Views
Performance of Comprehensions versus For Loops New to Julia	7	2055	October 23, 2019
Conditional array comprehension vs. simple for loop - 750k allocations vs. 1 allocation - why? Performance loops	14	1422	June 25, 2021
Surprising slowdown from using array comprehension Performance	8	267	June 4, 2025
Comprehensions and vectorization New to Julia	13	4930	September 20, 2017
Fast performance of array comprehension without allocations Performance	14	1228	July 28, 2021

Comprehensions versus pre-allocation

Code

Benchmark results

Related topics