Findall slow

Hello,

I would like to find indices of a 1D array wich verify a given condition. I think that the way to go is to use the function findall. However I have found this tricky way to be actually faster and has way less memory allocations. Any thought on that?

le = randn(10000)

@time findall( x -> (x > 0.0), le)
0.037949 seconds (96.30 k allocations: 4.746 MiB, 21.56% gc time)

@time (1:length(le))[le .> 0.0]
0.000035 seconds (15 allocations: 44.359 KiB)

I know that @time is not the best way to time a function but I am puzzled by the difference.

My guess is that findall is optimized under the assumption that few entries match. As a result, it appends to create a list of solutions, which is due and allocates more if everything matches. In that case, yours will allocate an array of size le, and will be much slower.

Your benchmarking is a bit problematic, so the actual difference isn’t as drastic as your benchmarks show them:

julia> using BenchmarkTools

julia> @btime findall( x -> (x > 0.0), $le);
  80.202 μs (17 allocations: 128.72 KiB)

julia> @btime (1:length($le))[$le .> 0.0];
  18.927 μs (7 allocations: 44.61 KiB)

This is still very suboptimal though. The actual implementation of findall currently looks like this:

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

It seems that adding

findall(testf::Function, A::AbstractArray) = keys(A)[testf.(A)]

would increase performance in your case quite a bit, although one would still have to benchmark, how this performance differs for different array sizes and percentages of matched keys. Definitely worth investigating further.

3 Likes

Here’s an implementation (based on filter) that’s faster for your example:

function findall2(f, a::Array{T, N}) where {T, N}
    j = 1
    b = Vector{Int}(undef, length(a))
    @inbounds for i in eachindex(a)
        b[j] = i
        j = ifelse(f(a[i]), j+1, j)
    end
    resize!(b, j-1)
    sizehint!(b, length(b))
    return b
end

And here’s the benchmark timings.
findall2 is about 8x faster than findall for this example.

Julia-1.3.0-rc4> @btime findall(x -> (x > 0.0), $le);
  86.199 μs (17 allocations: 128.72 KiB)

Julia-1.3.0-rc4> @btime (1:length($le))[$le .> 0.0];
  16.400 μs (7 allocations: 44.73 KiB)

Julia-1.3.0-rc4> @btime findall2(x -> (x > 0.0), $le);
  10.699 μs (3 allocations: 78.20 KiB)

Can you add benchmarks where it’s testing >.90? If it’s faster there as well, base should probably change implementation.

Here is another variant

function findall3(f, a::Array{T, N}) where {T, N}
    j = 1
    b = Vector{Int}(undef, length(a))
    @inbounds for i in eachindex(a)
        @inbounds if f(a[i])
            b[j] = i
            j += 1
        end
    end
    resize!(b, j-1)
    sizehint!(b, length(b))
    return b
end

Benchmark results:

julia> for l in 10 .^ (1:5)
           @show l
           le = randn(l)
           for i in (-3.3, 0.0, 3.3)
               @show i
               @btime findall(x -> (x > $i), $le);
               @btime findall2(x -> (x > $i), $le);
               @btime findall3(x -> (x > $i), $le);
           end
       end
l = 10
i = -3.3
  202.096 ns (8 allocations: 512 bytes)
  46.945 ns (1 allocation: 160 bytes)
  45.187 ns (1 allocation: 160 bytes)
i = 0.0
  125.518 ns (6 allocations: 288 bytes)
  51.448 ns (1 allocation: 160 bytes)
  48.814 ns (1 allocation: 160 bytes)
i = 3.3
  55.704 ns (4 allocations: 160 bytes)
  50.980 ns (1 allocation: 160 bytes)
  46.545 ns (1 allocation: 160 bytes)
l = 100
i = -3.3
  911.619 ns (11 allocations: 2.36 KiB)
  154.021 ns (1 allocation: 896 bytes)
  140.260 ns (1 allocation: 896 bytes)
i = 0.0
  610.851 ns (10 allocations: 1.30 KiB)
  164.151 ns (1 allocation: 896 bytes)
  145.557 ns (1 allocation: 896 bytes)
i = 3.3
  140.647 ns (4 allocations: 160 bytes)
  166.599 ns (1 allocation: 896 bytes)
  128.368 ns (1 allocation: 896 bytes)
l = 1000
i = -3.3
  8.444 μs (14 allocations: 16.55 KiB)
  1.216 μs (1 allocation: 7.94 KiB)
  1.112 μs (1 allocation: 7.94 KiB)
i = 0.0
  5.133 μs (13 allocations: 8.48 KiB)
  1.208 μs (1 allocation: 7.94 KiB)
  1.198 μs (1 allocation: 7.94 KiB)
i = 3.3
  985.500 ns (6 allocations: 288 bytes)
  1.195 μs (1 allocation: 7.94 KiB)
  848.512 ns (1 allocation: 7.94 KiB)
l = 10000
i = -3.3
  80.101 μs (18 allocations: 256.80 KiB)
  10.604 μs (2 allocations: 78.20 KiB)
  9.523 μs (2 allocations: 78.20 KiB)
i = 0.0
  97.453 μs (17 allocations: 128.73 KiB)
  12.121 μs (3 allocations: 78.20 KiB)
  44.257 μs (3 allocations: 78.20 KiB)
i = 3.3
  9.108 μs (8 allocations: 512 bytes)
  10.377 μs (3 allocations: 78.20 KiB)
  7.053 μs (3 allocations: 78.20 KiB)
l = 100000
i = -3.3
  786.670 μs (21 allocations: 2.00 MiB)
  114.514 μs (2 allocations: 781.33 KiB)
  102.834 μs (2 allocations: 781.33 KiB)
i = 0.0
  1.077 ms (20 allocations: 1.00 MiB)
  143.636 μs (3 allocations: 781.33 KiB)
  547.014 μs (3 allocations: 781.33 KiB)
i = 3.3
  91.253 μs (10 allocations: 1.30 KiB)
  103.810 μs (3 allocations: 781.33 KiB)
  68.090 μs (3 allocations: 781.33 KiB)

findall3 is consistenly faster than findall and findall2 except for i = 0 at l = 10000 and l = 100000, where findall2 is faster (I don’t understand why).

2 Likes

I believe that findall & friends are best for use cases which are relatively “sparse”, ie the number of things to be found is small relative to the total. This is of course a very vague definition, but in practice the issue is the same as for sparse matrices (ie when it becomes better to treat it as a dense matrix).

I find it interesting to read the elegant solutions above, but IMO using and optimizing findall for this use case is not the right solution. As @mleprovost observed in the original post, for a “dense” problem broadcasting may be better, and this is fine.

Note that findall3 is consistently faster than findall both for sparse problems (approx 0.0005 hits for finding normally distributed values above 3.3) and dense problems (approx 0.9995 hits for normally distributed values above -3.3).

1 Like

Indeed findall3 is a very nice implementation (and should work for all iterables, so I guess you could remove the type restriction). This is at the cost of allocating more (much more for long sequences), but that is not so costly to work against it.

However, I wonder if it is simply something like

holding findall back.

1 Like