Comprehensions versus pre-allocation

When it’s possible, I rewrite the calculation of some array from the “first pre-allocation then for-loop” to just a comprehension (albeit fairly complicated sometimes). In my mind comprehensions seem faster because the pre-allocation step is fused with the actual population of the desired array.

Is this generally true or are the differences infinitesimal and I’m (sometimes) sacrificing readability for nothing?

Can you give an example where comprehensions are less readable than creating an array and populating it? I usually find that they are eminently readable, and only deviate from comprehensions when the compiler isn’t able to infer the type for some reason (and I can). But that happens less and less these days.

1 Like
ps = [Vector{Pair{Int, CartesianIndex{2}}}([1 => i]) for i in CartesianRange(size(I1)) if I1[i] == I1max[i] && good_parameters(get_parameters(i, I))]

I would use more line breaks, but each to his own :smile:

It may be missing something, but I don’t see how creating and populating an array would be simpler for this example.

Yea, I agree that comprehensions are usually prettier and more readable. But beauty aside, are comprehension also faster?

1 Like

I had the feeling that comprehensions didn’t know the final size in advance, even in simple ones when there are no filters, hence pre-allocation would always be better. But I also have a doubt whether that is true or not.

No.

I’m not sure what you think it means to “fuse” allocation with assignment of the array elements. Under the hood, it allocates first (if the length is known) and then assigns.

No, if the size can be known in advance (i.e. the non-filter case with an iterator of known length), then it allocates all at once.

Technically, an expression like [x for x in 1:n] lowers to something like collect(Generator(i -> x, 1:n)). Since this Generator object is an iterator with iteratorsize(...) equal to HasShape() (that is, it supports a size function), then collect will allocate the resulting array all at once.

(You can see this if you define f(n) = [x for x in 1:n] and do @code_lowered f(3).)

9 Likes

Also, when you preallocate then the array and its data tend to be stored together. This is nice for the cache predictor if you plan to iterate over the array starting from 1. Also, it will prevent unnecessary moves during resizing and will not over-allocate.

In other words, preallocation is always better, but readability is more important in most usecases.
Possible exception: Pointer-arrays (you store a non-bitstype), a known-size generator might be able to avoid a memset-zero if the length of the generator is known.

Allocation placement:

A = Vector{Int}(1000);
pointer(A)- pointer_from_objref(A)
0x0000000000000040 #reproducible, the struct is exactly one cache-line in front of the data

A = Vector{Int}(); for i in 1:100 push!(A,i); end
pointer(A)- pointer_from_objref(A)
0xffffffffffe0c300 #not reproducible

As I explained above, in cases where the size of the iterator is known, comprehensions pre-allocate too.

Regarding my earlier comment about non-preallocation being very bad in the long run:

function prepvec_rnd(n, k)
    A = Vector{Vector{Int}}(n)
    for i in 1:n
        A[i] = Vector{Int}()
    end
    for j in 1:k
        t = rand(1:n)
        push!(A[t], j)
    end
    A
end

function test_speed(A)
    s = 0
    for i=1:length(A) # in A
        vec = A[i]
        for j = 1:length(vec)
            s+=vec[j]
        end
    end
    s
end

So, we build a vector-of-vectors, where the size of the inner vectors is inhomogeneous and not a priori known.

A = prepvec_rnd(10_0000, 100_0000); AA = deepcopy(A);
test_speed(A); @time test_speed(A);
  0.006608 seconds (5 allocations: 176 bytes)
test_speed(AA); @time test_speed(AA);
  0.002869 seconds (5 allocations: 176 bytes)

This difference is large enough to often make deepcopies worthwhile, if you amortize over a lot of reads.

Is this still the case in modern (1.9.1) Julia?

I’m comparing comprehension vs preallocation, and BenchmarkTools shows that comprehensions allocate 5 times more memory and 2016 times (!!!) more frequently than calling zeros(some_length) and then filling this vector.

Code

using BenchmarkTools
const AV{T} = AbstractVector{T};

function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
    omega, alpha, beta = params
    var = var0
    out = [
        begin
            var = omega + alpha * x^2 + beta * var
            var
        end
        for x in xs
    ]
    @views [var0; out[1:end-1]], var
end;

function fn_mutation(params::AV{T}, xs::AV{<:Real}, var0::Real) where T
    omega, alpha, beta = params
    var = var0
    out = zeros(T, length(xs))
    for (t, x) in enumerate(xs)
        out[t] = var = omega + alpha * x^2 + beta * var
    end
    @views [var0; out[1:end-1]], var
end;

xs = randn(1000);
par = randn(3)

@time fn_compr(par, xs, 1.0)
@time fn_mutation(par, xs, 1.0)

@benchmark fn_compr($par, $xs, 1.0)
@benchmark fn_mutation($par, $xs, 1.0)

Benchmark results

In [21]: @benchmark fn_compr($par, $xs, 1.0)
Out[21]: BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  74.080 μs …  2.238 ms  ┊ GC (min … max): 0.00% … 94.39%
 Time  (median):     79.192 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.108 μs ± 56.673 μs  ┊ GC (mean ± σ):  2.60% ±  3.73%

  ▂▂ ▇█▅▄▆▅▃▂▂▂▁▃▇▆▄▃▃▃▂▁▃▂▁▁   ▁                             ▂
  ███████████████████████████████████▆▆▅▆▇▆▆▆▇▇▇▆▆▅▄▅▂▅▅▄▄▅▄▅ █
  74.1 μs      Histogram: log(frequency) by time       113 μs <

 Memory estimate: 79.25 KiB, allocs estimate: 4032.

In [22]: @benchmark fn_mutation($par, $xs, 1.0)
Out[22]: BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.305 μs …  1.235 ms  ┊ GC (min … max):  0.00% … 99.15%
 Time  (median):     5.442 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   7.660 μs ± 46.164 μs  ┊ GC (mean ± σ):  25.25% ±  4.19%

  ▂▄▃▃▄▄▅██▇▆▆▆▃▂▁▁▁ ▁ ▁                                     ▂
  ████████████████████████▇▆▅▅▄▆▆▆▆▅▅▆▅▅▅▄▅▅▄▅▆▇▆▇▅▇▆▆▇▆▇▇▇▇ █
  4.31 μs      Histogram: log(frequency) by time     11.8 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.
  • fn_mutation allocates 2 times per call and consumes 15.88 KiB of memory per call.
  • fn_compr allocates 4032 times (!) per call and consumes 79.25 KiB of memory per call.

for x in xs iterates over a vector, so the size of the resulting vector should be known in advance, but apparently it’s not?

This initializes an array of type Any. You have to use for example typeof(xs)[... such that the comprehension has a concrete element type.

1 Like

Unfortunately, eltype(xs)[begin stuff end for x in xs] is a syntax error:

In [11]: function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
             omega, alpha, beta = params
             var = var0
             out = eltype(xs)[
                 begin
                     var = omega + alpha * x^2 + beta * var
                     var
                 end
                 for x in xs
             ]
             @views [var0; out[1:end-1]], var
         end;
ERROR: syntax: unexpected "end"
Stacktrace:
 [1] top-level scope
   @ none:1
 [2] top-level scope
   @ julia/stdlib/v1.10/REPL/src/REPL.jl:1413

However, eltype(xs)[x for x in xs] works fine:

In [15]: [begin x end for x in xs];

In [16]: eltype(xs)[begin x end for x in xs];
ERROR: syntax: unexpected "end"

Moreover, if I add @info typeof(out) right after the comprehension, it’ll output Vector{Float64}, so it looks like out isn’t a vector of Any.

I must admit that I don’t know why that syntax error occurs, since:

julia> f(x) = eltype(x)[ 1.0*x[i] for i in 1:length(x) ]
f (generic function with 1 method)

julia> f([1,2])
2-element Vector{Int64}:
 1
 2

julia> f([1.0,2.0])
2-element Vector{Float64}:
 1.0
 2.0

You can do:


julia> function fn_compr(params::AV{<:Real}, xs::AV{<:Real}, var0::Real)
           omega, alpha, beta = params
           var = var0
           out = eltype(xs)[
               (var = omega + alpha * x^2 + beta * var)
               for x in xs
           ]
           @views [var0; out[1:end-1]], var
       end;

julia> @btime fn_compr($par, $xs, 1.0);
  3.025 μs (2 allocations: 15.88 KiB)

Which solves the syntax error.

The syntax error is related to the begin ... end block associated to the type definition, and is, perhaps, a bug? Not sure.

I think it cannot know at compile time, and that causes the boxing (in some situations it could know at compile time, but that is probably dependent on the inference heuristics). If you provide the type, it is known for sure and will error if the construction fails.

Amazing! This does completely remove all unnecessary allocations, indeed. Thank you!

However, it seems like the syntax error is a bug:

  1. expr1[expr2 for expr3 in expr4] is valid syntax
  2. [expr2 for expr3 in expr4] i valid syntax
  3. [begin expr2 end for expr3 in expr4] is valid syntax
  4. For consistency, expr1[begin expr2 end for expr3 in expr4] should be valid syntax as well, but it’s not…

BTW, NamedTuple-like syntax works fine with multiple statements/expressions as well:

P[
    (
        var = omega + alpha * x^2 + beta * var;
        var
    )
    for x in xs
]

I was under the impression that begin x; y end and (x; y) were syntactically equivalent, but apparently not.

1 Like

Bug filled: Syntax error with comprehension with begin ... end block with annotated type · Issue #50265 · JuliaLang/julia · GitHub

(we’ll see if that’s something known already)

1 Like