Comprehension vs map and filter unexpected speeds

Hello, I have the following comparison and I wonder why the last way to achieve the result is the fastest as i should create an additional array whereas the fist two don’t.

julia> const A = rand(10000);
WARNING: redefining constant A

julia> @time [a^2 for a ∈ A if a > 0.5];
  0.138524 seconds (80.82 k allocations: 3.978 MiB)

julia> @time map(x -> x^2, filter(x -> x > 0.5, A));
  0.130198 seconds (69.27 k allocations: 3.601 MiB)

julia> @time [a^2 for a ∈ A[A .> 0.5]];
  0.099821 seconds (51.53 k allocations: 2.623 MiB)

I have tested it with a few array sizes and several tries – the behavior remains the same…

You want to use @btime, notice the b

Yeah, I have tried but the result is even more pronounced:

julia> const A = rand(10000);
WARNING: redefining constant A

julia> using BenchmarkTools

julia> @btime [a^2 for a ∈ A if a > 0.5];
  188.852 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.5, A));
  193.042 μs (16 allocations: 167.36 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.5]];
  88.000 μs (10 allocations: 83.14 KiB)

The problem here is probably that @btime explicitly tries to not to measure the time for creating A[A .> 0.5]. But in this case this is unfair than.

I have forgotten to remove BenchmarkTools in the previous example.

I don’t think that’s the case. It would be the case if you interpolated that expression, which you don’t.

However, the time is roughly the half in the last case so something very useful is optimized here.

I have tried to remove the ^ and the result persists.

julia> @btime [a for a ∈ A if a > 0.5];
  186.337 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x, filter(x -> x > 0.5, A));
  192.763 μs (16 allocations: 167.36 KiB)

julia> @btime [a for a ∈ A[A .> 0.5]];
  88.280 μs (10 allocations: 83.14 KiB)

The first one is probably slow because the length of the final result is unknown, which it is not in the last two. Not sure why second is slower than third though

Ah the filter does not know the length of the result. The binary indexing probably does then…?

I don’t know this one, could you explain, please…

I was referring to indexing with a BitArray, my choice of wording is perhaps nonstandard :stuck_out_tongue:

julia> i = a .> 0.5
100-element BitArray{1}

Ah, ok, but still, in the above example it has to first go through the whole array to get A[A .> 0.5] why is it so much faster?

Yeah, the last line in this snippet

function _unsafe_getindex(::IndexStyle, A::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
    # This is specifically not inlined to prevent excessive allocations in type unstable code
    shape = index_shape(I...)
    dest = similar(A, shape)

the shape is of type Base.OneTo(length) there, so this will be more efficient as only one new array is created with the correct size from the beginning

The other two methods incrementally build the result array, starting with a small array, not knowing how long it will end up. Growing the array when it has reached it’s maximum capacity is costly in this setting.

1 Like

Yeah, thanks, this is the explanation I have searched for.

However, it lets me a little bit unsatisfied in the since that it makes a difference which method to use :frowning:

yeah I have no good answer there, other than benchmarking is simple and often worth it :slight_smile: (if this code is performance sensitive and a bottleneck, otherwise no need to bother)

1 Like

The relative timing may vary based on how many elements of A passes through the filter. If almost no elements satisfies the condition, the first two might be faster. If most elements fit, the last one will win.

1 Like

that’s for sure :wink:

With different number of elements passing through the filter, I get these timings

julia> @btime [a^2 for a ∈ A if a > 0.5];

  99.445 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.5, A));
  15.326 μs (6 allocations: 117.86 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.5]];
  24.956 μs (10 allocations: 84.89 KiB)


julia> @btime [a^2 for a ∈ A if a > 0.95];
  17.587 μs (13 allocations: 16.50 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.95, A));
  10.624 μs (5 allocations: 82.41 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.95]];
  15.191 μs (8 allocations: 13.98 KiB)

There is indeed a difference in relative timing. I have the second version as the fastest on julia v1.3-rc5, maybe it has seen performance improvements

1 Like

By the way, if used as a generator, the timing is different:

julia> @btime sum(a for a ∈ A if a > 0.5);
  61.740 μs (2 allocations: 32 bytes)

julia> @btime sum(map(x -> x, filter(x -> x > 0.5, A)));
  194.161 μs (16 allocations: 167.36 KiB)

julia> @btime sum(a for a ∈ A[A .> 0.5]);
  84.090 μs (8 allocations: 44.38 KiB)

Those are all expected though, as the sum does not have to construct a vector. The last two does construct the vector, which is expensive, whereas the first one does not.

Here, you can speed thing up further

julia> @btime sum(a for a ∈ A if a > 0.5);
  36.884 μs (2 allocations: 32 bytes)

julia> @btime sum(a->ifelse(a>0.5, a, zero(a)),  A);
  2.263 μs (0 allocations: 0 bytes)
1 Like