Comprehensions versus pre-allocation

question

#1

When it’s possible, I rewrite the calculation of some array from the “first pre-allocation then for-loop” to just a comprehension (albeit fairly complicated sometimes). In my mind comprehensions seem faster because the pre-allocation step is fused with the actual population of the desired array.

Is this generally true or are the differences infinitesimal and I’m (sometimes) sacrificing readability for nothing?


#2

Can you give an example where comprehensions are less readable than creating an array and populating it? I usually find that they are eminently readable, and only deviate from comprehensions when the compiler isn’t able to infer the type for some reason (and I can). But that happens less and less these days.


#3
ps = [Vector{Pair{Int, CartesianIndex{2}}}([1 => i]) for i in CartesianRange(size(I1)) if I1[i] == I1max[i] && good_parameters(get_parameters(i, I))]

#4

I would use more line breaks, but each to his own :smile:

It may be missing something, but I don’t see how creating and populating an array would be simpler for this example.


#5

Yea, I agree that comprehensions are usually prettier and more readable. But beauty aside, are comprehension also faster?


#6

I had the feeling that comprehensions didn’t know the final size in advance, even in simple ones when there are no filters, hence pre-allocation would always be better. But I also have a doubt whether that is true or not.


#7

No.

I’m not sure what you think it means to “fuse” allocation with assignment of the array elements. Under the hood, it allocates first (if the length is known) and then assigns.

No, if the size can be known in advance (i.e. the non-filter case with an iterator of known length), then it allocates all at once.

Technically, an expression like [x for x in 1:n] lowers to something like collect(Generator(i -> x, 1:n)). Since this Generator object is an iterator with iteratorsize(...) equal to HasShape() (that is, it supports a size function), then collect will allocate the resulting array all at once.

(You can see this if you define f(n) = [x for x in 1:n] and do @code_lowered f(3).)


#8

Also, when you preallocate then the array and its data tend to be stored together. This is nice for the cache predictor if you plan to iterate over the array starting from 1. Also, it will prevent unnecessary moves during resizing and will not over-allocate.

In other words, preallocation is always better, but readability is more important in most usecases.
Possible exception: Pointer-arrays (you store a non-bitstype), a known-size generator might be able to avoid a memset-zero if the length of the generator is known.

Allocation placement:

A = Vector{Int}(1000);
pointer(A)- pointer_from_objref(A)
0x0000000000000040 #reproducible, the struct is exactly one cache-line in front of the data

A = Vector{Int}(); for i in 1:100 push!(A,i); end
pointer(A)- pointer_from_objref(A)
0xffffffffffe0c300 #not reproducible

#9

As I explained above, in cases where the size of the iterator is known, comprehensions pre-allocate too.


#10

Regarding my earlier comment about non-preallocation being very bad in the long run:

function prepvec_rnd(n, k)
    A = Vector{Vector{Int}}(n)
    for i in 1:n
        A[i] = Vector{Int}()
    end
    for j in 1:k
        t = rand(1:n)
        push!(A[t], j)
    end
    A
end

function test_speed(A)
    s = 0
    for i=1:length(A) # in A
        vec = A[i]
        for j = 1:length(vec)
            s+=vec[j]
        end
    end
    s
end

So, we build a vector-of-vectors, where the size of the inner vectors is inhomogeneous and not a priori known.

A = prepvec_rnd(10_0000, 100_0000); AA = deepcopy(A);
test_speed(A); @time test_speed(A);
  0.006608 seconds (5 allocations: 176 bytes)
test_speed(AA); @time test_speed(AA);
  0.002869 seconds (5 allocations: 176 bytes)

This difference is large enough to often make deepcopies worthwhile, if you amortize over a lot of reads.