Hello, I have the following comparison and I wonder why the last way to achieve the result is the fastest as i should create an additional array whereas the fist two don’t.
julia> const A = rand(10000);
WARNING: redefining constant A
julia> @time [a^2 for a ∈ A if a > 0.5];
0.138524 seconds (80.82 k allocations: 3.978 MiB)
julia> @time map(x -> x^2, filter(x -> x > 0.5, A));
0.130198 seconds (69.27 k allocations: 3.601 MiB)
julia> @time [a^2 for a ∈ A[A .> 0.5]];
0.099821 seconds (51.53 k allocations: 2.623 MiB)
I have tested it with a few array sizes and several tries – the behavior remains the same…
The first one is probably slow because the length of the final result is unknown, which it is not in the last two. Not sure why second is slower than third though
function _unsafe_getindex(::IndexStyle, A::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
# This is specifically not inlined to prevent excessive allocations in type unstable code
shape = index_shape(I...)
dest = similar(A, shape)
the shape is of type Base.OneTo(length) there, so this will be more efficient as only one new array is created with the correct size from the beginning
The other two methods incrementally build the result array, starting with a small array, not knowing how long it will end up. Growing the array when it has reached it’s maximum capacity is costly in this setting.
yeah I have no good answer there, other than benchmarking is simple and often worth it (if this code is performance sensitive and a bottleneck, otherwise no need to bother)
The relative timing may vary based on how many elements of A passes through the filter. If almost no elements satisfies the condition, the first two might be faster. If most elements fit, the last one will win.
With different number of elements passing through the filter, I get these timings
julia> @btime [a^2 for a ∈ A if a > 0.5];
99.445 μs (16 allocations: 128.69 KiB)
julia> @btime map(x -> x^2, filter(x -> x > 0.5, A));
15.326 μs (6 allocations: 117.86 KiB)
julia> @btime [a^2 for a ∈ A[A .> 0.5]];
24.956 μs (10 allocations: 84.89 KiB)
julia> @btime [a^2 for a ∈ A if a > 0.95];
17.587 μs (13 allocations: 16.50 KiB)
julia> @btime map(x -> x^2, filter(x -> x > 0.95, A));
10.624 μs (5 allocations: 82.41 KiB)
julia> @btime [a^2 for a ∈ A[A .> 0.95]];
15.191 μs (8 allocations: 13.98 KiB)
There is indeed a difference in relative timing. I have the second version as the fastest on julia v1.3-rc5, maybe it has seen performance improvements
Those are all expected though, as the sum does not have to construct a vector. The last two does construct the vector, which is expensive, whereas the first one does not.
julia> @btime sum(a for a ∈ A if a > 0.5);
36.884 μs (2 allocations: 32 bytes)
julia> @btime sum(a->ifelse(a>0.5, a, zero(a)), A);
2.263 μs (0 allocations: 0 bytes)