Comprehension vs map and filter unexpected speeds

stakaz · November 20, 2019, 1:02pm

Hello, I have the following comparison and I wonder why the last way to achieve the result is the fastest as i should create an additional array whereas the fist two don’t.

julia> const A = rand(10000);
WARNING: redefining constant A

julia> @time [a^2 for a ∈ A if a > 0.5];
  0.138524 seconds (80.82 k allocations: 3.978 MiB)

julia> @time map(x -> x^2, filter(x -> x > 0.5, A));
  0.130198 seconds (69.27 k allocations: 3.601 MiB)

julia> @time [a^2 for a ∈ A[A .> 0.5]];
  0.099821 seconds (51.53 k allocations: 2.623 MiB)

I have tested it with a few array sizes and several tries – the behavior remains the same…

baggepinnen · November 20, 2019, 1:06pm

You want to use @btime, notice the b

stakaz · November 20, 2019, 1:09pm

Yeah, I have tried but the result is even more pronounced:

julia> const A = rand(10000);
WARNING: redefining constant A

julia> using BenchmarkTools

julia> @btime [a^2 for a ∈ A if a > 0.5];
  188.852 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.5, A));
  193.042 μs (16 allocations: 167.36 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.5]];
  88.000 μs (10 allocations: 83.14 KiB)

The problem here is probably that @btime explicitly tries to not to measure the time for creating A[A .> 0.5]. But in this case this is unfair than.

I have forgotten to remove BenchmarkTools in the previous example.

baggepinnen · November 20, 2019, 1:11pm

I don’t think that’s the case. It would be the case if you interpolated that expression, which you don’t.

stakaz · November 20, 2019, 1:13pm

However, the time is roughly the half in the last case so something very useful is optimized here.

I have tried to remove the ^ and the result persists.

julia> @btime [a for a ∈ A if a > 0.5];
  186.337 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x, filter(x -> x > 0.5, A));
  192.763 μs (16 allocations: 167.36 KiB)

julia> @btime [a for a ∈ A[A .> 0.5]];
  88.280 μs (10 allocations: 83.14 KiB)

baggepinnen · November 20, 2019, 1:14pm

The first one is probably slow because the length of the final result is unknown, which it is not in the last two. Not sure why second is slower than third though

baggepinnen · November 20, 2019, 1:15pm

Ah the filter does not know the length of the result. The binary indexing probably does then…?

stakaz · November 20, 2019, 1:17pm

I don’t know this one, could you explain, please…

baggepinnen · November 20, 2019, 1:20pm

I was referring to indexing with a BitArray, my choice of wording is perhaps nonstandard

julia> i = a .> 0.5
100-element BitArray{1}

stakaz · November 20, 2019, 1:21pm

Ah, ok, but still, in the above example it has to first go through the whole array to get A[A .> 0.5] why is it so much faster?

baggepinnen · November 20, 2019, 1:22pm

Yeah, the last line in this snippet

function _unsafe_getindex(::IndexStyle, A::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
    # This is specifically not inlined to prevent excessive allocations in type unstable code
    shape = index_shape(I...)
    dest = similar(A, shape)

the shape is of type Base.OneTo(length) there, so this will be more efficient as only one new array is created with the correct size from the beginning

baggepinnen · November 20, 2019, 1:24pm

The other two methods incrementally build the result array, starting with a small array, not knowing how long it will end up. Growing the array when it has reached it’s maximum capacity is costly in this setting.

stakaz · November 20, 2019, 1:26pm

Yeah, thanks, this is the explanation I have searched for.

However, it lets me a little bit unsatisfied in the since that it makes a difference which method to use

baggepinnen · November 20, 2019, 1:27pm

yeah I have no good answer there, other than benchmarking is simple and often worth it (if this code is performance sensitive and a bottleneck, otherwise no need to bother)

baggepinnen · November 20, 2019, 1:28pm

The relative timing may vary based on how many elements of A passes through the filter. If almost no elements satisfies the condition, the first two might be faster. If most elements fit, the last one will win.

stakaz · November 20, 2019, 1:28pm

that’s for sure

baggepinnen · November 20, 2019, 1:32pm

With different number of elements passing through the filter, I get these timings

julia> @btime [a^2 for a ∈ A if a > 0.5];

  99.445 μs (16 allocations: 128.69 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.5, A));
  15.326 μs (6 allocations: 117.86 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.5]];
  24.956 μs (10 allocations: 84.89 KiB)


julia> @btime [a^2 for a ∈ A if a > 0.95];
  17.587 μs (13 allocations: 16.50 KiB)

julia> @btime map(x -> x^2, filter(x -> x > 0.95, A));
  10.624 μs (5 allocations: 82.41 KiB)

julia> @btime [a^2 for a ∈ A[A .> 0.95]];
  15.191 μs (8 allocations: 13.98 KiB)

There is indeed a difference in relative timing. I have the second version as the fastest on julia v1.3-rc5, maybe it has seen performance improvements

stakaz · November 20, 2019, 1:42pm

By the way, if used as a generator, the timing is different:

julia> @btime sum(a for a ∈ A if a > 0.5);
  61.740 μs (2 allocations: 32 bytes)

julia> @btime sum(map(x -> x, filter(x -> x > 0.5, A)));
  194.161 μs (16 allocations: 167.36 KiB)

julia> @btime sum(a for a ∈ A[A .> 0.5]);
  84.090 μs (8 allocations: 44.38 KiB)

baggepinnen · November 20, 2019, 1:51pm

Those are all expected though, as the sum does not have to construct a vector. The last two does construct the vector, which is expensive, whereas the first one does not.

baggepinnen · November 20, 2019, 1:53pm

Here, you can speed thing up further

julia> @btime sum(a for a ∈ A if a > 0.5);
  36.884 μs (2 allocations: 32 bytes)

julia> @btime sum(a->ifelse(a>0.5, a, zero(a)),  A);
  2.263 μs (0 allocations: 0 bytes)

Topic		Replies	Views
Trying to bench two code samples and I'm getting different results @time vs @btime New to Julia	3	385	September 16, 2020
Performance of variants of findall Performance	15	1464	February 28, 2023
.= vs = speed difference New to Julia	2	553	June 12, 2019
Substituting elements of array is very slow General Usage	23	1365	July 22, 2021
Which is the most efficient way to add matrices/arrays in Julia 1.x? Performance matrices , julia-1x	11	2117	March 26, 2021

Comprehension vs map and filter unexpected speeds

Related topics