Allocations in Comprehensions


I wonder if there is a simple way to construct a container (Array or Tuple) with comprehensions or similar but with the performance of a hand-written container:

julia> @time [fill(1, 1000), fill(1, 1000), fill(1, 1000), fill(1, 1000)]
  0.000024 seconds (9 allocations: 32.016 KiB)

julia> @time [fill(1, 1000) for i in 1:4]
  0.101525 seconds (49.46 k allocations: 2.536 MiB)

I am also asking if there is a way to declare variables in a @threads for-loop on everey thread independent. I came to the upper example during a workaround for the latter problem.

You’re just measuring compilation time here — a comprehension creates a function to do its work, so at the REPL each time you run it, it’ll create a new function that needs to be compiled. But when the comprehension itself is in a function, then you’ll only pay that cost once:

julia> f() = [fill(1, 1000) for i in 1:4]
f (generic function with 1 method)

julia> @time f();
  0.049694 seconds (102.72 k allocations: 5.554 MiB)

julia> @time f();
  0.000039 seconds (9 allocations: 32.016 KiB)

I highly recommend using the BenchmarkTools package — it’ll deal with many of these intricacies for you.


For tuples, the situation is a bit more complicated, since you might want the compiler to correctly infer the tuple type (including its size).


Hand-written version:

julia> using Test: @inferred
julia> using BenchmarkTools: @btime

julia> f1() = (fill(1,1000), fill(1,1000), fill(1,1000), fill(1,1000))
f1 (generic function with 1 method)

julia> @inferred f1();

julia> @btime f1();
  2.575 μs (5 allocations: 31.80 KiB)

Comprehension: not only do you lose a bit of performance, but also (and more importantly) you lose the correct inference of the return type (the tuple size is not deduced):

julia> f2() = Tuple(fill(1,1000) for i in 1:4)
f2 (generic function with 1 method)

julia> @inferred f2();
ERROR: return type NTuple{4,Array{Int64,1}} does not match inferred return type Tuple{Vararg{Array{Int64,1},N} where N}                                              

julia> @btime f2();
  3.451 μs (6 allocations: 31.91 KiB)

To get the nice properties of the hand-written form in a more concise and DRY way, you can use the ntuple function:

julia> f3() = ntuple(_->fill(1,1000), 4)
f3 (generic function with 1 method)

julia> @inferred f3();

julia> @btime f3();
  2.460 μs (5 allocations: 31.80 KiB)