Findall slow

Hello,

I would like to find indices of a 1D array wich verify a given condition. I think that the way to go is to use the function findall. However I have found this tricky way to be actually faster and has way less memory allocations. Any thought on that?

le = randn(10000)

@time findall( x -> (x > 0.0), le)
0.037949 seconds (96.30 k allocations: 4.746 MiB, 21.56% gc time)

@time (1:length(le))[le .> 0.0]
0.000035 seconds (15 allocations: 44.359 KiB)

I know that @time is not the best way to time a function but I am puzzled by the difference.

My guess is that findall is optimized under the assumption that few entries match. As a result, it appends to create a list of solutions, which is due and allocates more if everything matches. In that case, yours will allocate an array of size le, and will be much slower.

Your benchmarking is a bit problematic, so the actual difference isnā€™t as drastic as your benchmarks show them:

julia> using BenchmarkTools

julia> @btime findall( x -> (x > 0.0), $le);
  80.202 Ī¼s (17 allocations: 128.72 KiB)

julia> @btime (1:length($le))[$le .> 0.0];
  18.927 Ī¼s (7 allocations: 44.61 KiB)

This is still very suboptimal though. The actual implementation of findall currently looks like this:

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

It seems that adding

findall(testf::Function, A::AbstractArray) = keys(A)[testf.(A)]

would increase performance in your case quite a bit, although one would still have to benchmark, how this performance differs for different array sizes and percentages of matched keys. Definitely worth investigating further.

3 Likes

Hereā€™s an implementation (based on filter) thatā€™s faster for your example:

function findall2(f, a::Array{T, N}) where {T, N}
    j = 1
    b = Vector{Int}(undef, length(a))
    @inbounds for i in eachindex(a)
        b[j] = i
        j = ifelse(f(a[i]), j+1, j)
    end
    resize!(b, j-1)
    sizehint!(b, length(b))
    return b
end

And hereā€™s the benchmark timings.
findall2 is about 8x faster than findall for this example.

Julia-1.3.0-rc4> @btime findall(x -> (x > 0.0), $le);
  86.199 Ī¼s (17 allocations: 128.72 KiB)

Julia-1.3.0-rc4> @btime (1:length($le))[$le .> 0.0];
  16.400 Ī¼s (7 allocations: 44.73 KiB)

Julia-1.3.0-rc4> @btime findall2(x -> (x > 0.0), $le);
  10.699 Ī¼s (3 allocations: 78.20 KiB)

Can you add benchmarks where itā€™s testing >.90? If itā€™s faster there as well, base should probably change implementation.

Here is another variant

function findall3(f, a::Array{T, N}) where {T, N}
    j = 1
    b = Vector{Int}(undef, length(a))
    @inbounds for i in eachindex(a)
        @inbounds if f(a[i])
            b[j] = i
            j += 1
        end
    end
    resize!(b, j-1)
    sizehint!(b, length(b))
    return b
end

Benchmark results:

julia> for l in 10 .^ (1:5)
           @show l
           le = randn(l)
           for i in (-3.3, 0.0, 3.3)
               @show i
               @btime findall(x -> (x > $i), $le);
               @btime findall2(x -> (x > $i), $le);
               @btime findall3(x -> (x > $i), $le);
           end
       end
l = 10
i = -3.3
  202.096 ns (8 allocations: 512 bytes)
  46.945 ns (1 allocation: 160 bytes)
  45.187 ns (1 allocation: 160 bytes)
i = 0.0
  125.518 ns (6 allocations: 288 bytes)
  51.448 ns (1 allocation: 160 bytes)
  48.814 ns (1 allocation: 160 bytes)
i = 3.3
  55.704 ns (4 allocations: 160 bytes)
  50.980 ns (1 allocation: 160 bytes)
  46.545 ns (1 allocation: 160 bytes)
l = 100
i = -3.3
  911.619 ns (11 allocations: 2.36 KiB)
  154.021 ns (1 allocation: 896 bytes)
  140.260 ns (1 allocation: 896 bytes)
i = 0.0
  610.851 ns (10 allocations: 1.30 KiB)
  164.151 ns (1 allocation: 896 bytes)
  145.557 ns (1 allocation: 896 bytes)
i = 3.3
  140.647 ns (4 allocations: 160 bytes)
  166.599 ns (1 allocation: 896 bytes)
  128.368 ns (1 allocation: 896 bytes)
l = 1000
i = -3.3
  8.444 Ī¼s (14 allocations: 16.55 KiB)
  1.216 Ī¼s (1 allocation: 7.94 KiB)
  1.112 Ī¼s (1 allocation: 7.94 KiB)
i = 0.0
  5.133 Ī¼s (13 allocations: 8.48 KiB)
  1.208 Ī¼s (1 allocation: 7.94 KiB)
  1.198 Ī¼s (1 allocation: 7.94 KiB)
i = 3.3
  985.500 ns (6 allocations: 288 bytes)
  1.195 Ī¼s (1 allocation: 7.94 KiB)
  848.512 ns (1 allocation: 7.94 KiB)
l = 10000
i = -3.3
  80.101 Ī¼s (18 allocations: 256.80 KiB)
  10.604 Ī¼s (2 allocations: 78.20 KiB)
  9.523 Ī¼s (2 allocations: 78.20 KiB)
i = 0.0
  97.453 Ī¼s (17 allocations: 128.73 KiB)
  12.121 Ī¼s (3 allocations: 78.20 KiB)
  44.257 Ī¼s (3 allocations: 78.20 KiB)
i = 3.3
  9.108 Ī¼s (8 allocations: 512 bytes)
  10.377 Ī¼s (3 allocations: 78.20 KiB)
  7.053 Ī¼s (3 allocations: 78.20 KiB)
l = 100000
i = -3.3
  786.670 Ī¼s (21 allocations: 2.00 MiB)
  114.514 Ī¼s (2 allocations: 781.33 KiB)
  102.834 Ī¼s (2 allocations: 781.33 KiB)
i = 0.0
  1.077 ms (20 allocations: 1.00 MiB)
  143.636 Ī¼s (3 allocations: 781.33 KiB)
  547.014 Ī¼s (3 allocations: 781.33 KiB)
i = 3.3
  91.253 Ī¼s (10 allocations: 1.30 KiB)
  103.810 Ī¼s (3 allocations: 781.33 KiB)
  68.090 Ī¼s (3 allocations: 781.33 KiB)

findall3 is consistenly faster than findall and findall2 except for i = 0 at l = 10000 and l = 100000, where findall2 is faster (I donā€™t understand why).

2 Likes

I believe that findall & friends are best for use cases which are relatively ā€œsparseā€, ie the number of things to be found is small relative to the total. This is of course a very vague definition, but in practice the issue is the same as for sparse matrices (ie when it becomes better to treat it as a dense matrix).

I find it interesting to read the elegant solutions above, but IMO using and optimizing findall for this use case is not the right solution. As @mleprovost observed in the original post, for a ā€œdenseā€ problem broadcasting may be better, and this is fine.

Note that findall3 is consistently faster than findall both for sparse problems (approx 0.0005 hits for finding normally distributed values above 3.3) and dense problems (approx 0.9995 hits for normally distributed values above -3.3).

1 Like

Indeed findall3 is a very nice implementation (and should work for all iterables, so I guess you could remove the type restriction). This is at the cost of allocating more (much more for long sequences), but that is not so costly to work against it.

However, I wonder if it is simply something like

https://github.com/JuliaLang/julia/issues/24909

holding findall back.

1 Like