How to obtain indices of an array satisfying boolean condition

iamsuddhasattwa · April 17, 2020, 7:41pm

I am trying to find the optimum way of obtaining the indices of an array which satisfy a certain boolean condition. For example, if a is of type Array{Float64,1}, i wish to obtain an array of indices i for which a[i]>0. This task can be easily accomplished by using a loop, but that would not be optimal.

I have looked extensively in the documentation for the Base class, but didnt find something that achieves what I am aiming for. There are some nice functions such as filter(f, a::AbstractArray) which does not return the indices but the elements at the indices.

dpo · April 17, 2020, 7:42pm

a[a .> 0] ?

stillyslalom · April 17, 2020, 8:00pm

julia> A = randn(10)
10-element Array{Float64,1}:
 -0.10597385263640592
  0.6804124752504479 
 -0.745552871357189  
 -0.8316766727532244 
 -0.4136441092819935 
 -1.7795858830337792 
 -0.7049887890183149 
  1.495347684116737  
 -0.05797357451997173
  0.1689282220877404 

julia> findall(>(0), A)
3-element Array{Int64,1}:
  2
  8
 10

dpsanders · April 17, 2020, 8:13pm

A loop would (presumably) be optimal for the computer, but I agree that less typing should be required, and hence it’s not necessarily optimal for the human.

Although you probably would have spent less time writing the loop than looking in the docs and writing this post!

iamsuddhasattwa · April 17, 2020, 8:55pm

I have read that Julia’s inline commands such as findall are highly optimized and always recommended over explicit loops.

iamsuddhasattwa · April 17, 2020, 8:57pm

This still does not give the indices. But it was useful knowing about this notation. Thank you.

iamsuddhasattwa · April 17, 2020, 9:04pm

Thank you for suggesting findall. It worked very well. However, I think there is a slight mistake in your notation, since I get

 findall(>(0), a)
ERROR: MethodError: no method matching >(::Int64)
Closest candidates are:
  >(::Any, ::Any) at operators.jl:286
  >(::BigFloat, ::BigFloat) at mpfr.jl:756
Stacktrace:
 [1] top-level scope at none:0

However, the following syntax works.

 findall(i->(i>0.5), a)
``

dpo · April 17, 2020, 9:20pm

I misread your question, sorry.

stillyslalom · April 17, 2020, 9:22pm

Single-argument comparisons were added in v1.2, so you must be using an older release. See https://github.com/JuliaLang/julia/blob/master/HISTORY.md#new-library-functions-2

iamsuddhasattwa · April 17, 2020, 10:51pm

I see ! Thanks

Tamas_Papp · April 18, 2020, 8:34am

I am curious where you read this. Loops are very efficient in Julia when written right (eg using eachindex etc).

dpsanders · April 18, 2020, 8:47am

julia> v = randn(1000);

julia> function myfindall(condition, x)
       results = Int[]

       for i in 1:length(x)
           if condition(x[i])
               push!(results, i)
           end
       end
       return results
       end
myfindall (generic function with 1 method)

julia> @btime findall(x -> x > 0, $v);
  3.438 μs (13 allocations: 8.47 KiB)

julia> @btime myfindall(x -> x > 0, $v);
  3.707 μs (9 allocations: 8.33 KiB)

kristoffer.carlsson · April 18, 2020, 9:02am

While true, the Base implementations might use tricks and optimizations that are not always obvious which often make them faster than the naive for loop.

DNF · April 18, 2020, 9:15am

But the Base implementations aren’t ‘privileged’. With some experience anyone could implement their own version that matches Base, or does even better if they have special knowledge of the specific problem.

So, @iamsuddhasattwa, you shouldn’t dismiss the possibility of writing your own implementation, though findall may be what you were looking for this time.

DNF · April 18, 2020, 9:46am

Here’s an example of how you can beat the Base implementation of findall for a special case. Let’s say, for example, that you know that most of the elements in your array satisfy your condition. Then it can be more efficient to allocate a large output array in the first place rather than growing it dynamically, like findall does:

function myfindall(p, X)
    out = Vector{Int}(undef, length(X))
    ind = 0
    @inbounds for (i, x) in pairs(X)
        if p(x)
            out[ind+=1] = i
        end
    end
    resize!(out, ind)
    return out
end

There’s nothing magical going on, just ordinary code you can write yourself. Now let’s benchmark:

julia> @benchmark findall(>(0.1), r) setup=(r=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  16.55 KiB
  allocs estimate:  14
  --------------
  minimum time:     8.982 μs (0.00% GC)
  median time:      10.557 μs (0.00% GC)
  mean time:        13.389 μs (17.23% GC)
  maximum time:     8.171 ms (99.39% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark myfindall(>(0.1), r) setup=(r=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     746.443 ns (0.00% GC)
  median time:      1.241 μs (0.00% GC)
  mean time:        2.408 μs (48.33% GC)
  maximum time:     73.695 μs (98.12% GC)
  --------------
  samples:          10000
  evals/sample:     122

So, my homemade solution beats the builtin findall by a factor of 10 in this special case. (Edit: I moved the @inbounds for an even bigger speedup.)

For the record, here is the Base implementation of findall:

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

See how simple it is, almost trivial. It’s actually amazing.

Tamas_Papp · April 18, 2020, 12:36pm

I am not sure what you consider a “naive” loop, but most loops which just traverse something in a memory-friendly order are usually on par with what Base does when the performance tips are kept in mind. For the rest, yes, some tricks can be necessary. The majority of the constructs that make Base implementations fast are exposed.

I would say that loops are usually one of the fastest solution in Julia, and the only reason not to apply them is code clarity: more abstract code is usually easier to understand.

iamsuddhasattwa · April 18, 2020, 2:04pm

Thanks! That was very eye opening.

kristoffer.carlsson · April 18, 2020, 2:18pm

Just to give some examples of what I mean:

minimum:

github.com

JuliaLang/julia/blob/48f942bdd659ae55d7d3e34dcf15dbb8bd871d71/base/reduce.jl#L562-L602


      
          function mapreduce_impl(f, op::Union{typeof(max), typeof(min)},
                                  A::AbstractArray, first::Int, last::Int)
              a1 = @inbounds A[first]
              v1 = mapreduce_first(f, op, a1)
              v2 = v3 = v4 = v1
              chunk_len = 256
              start = first + 1
              simdstop  = start + chunk_len - 4
              while simdstop <= last - 3
                  # short circuit in case of NaN
                  v1 == v1 || return v1
                  v2 == v2 || return v2
                  v3 == v3 || return v3
                  v4 == v4 || return v4
                  @inbounds for i in start:4:simdstop
                      v1 = _fast(op, v1, f(A[i+0]))
                      v2 = _fast(op, v2, f(A[i+1]))
                      v3 = _fast(op, v3, f(A[i+2]))
                      v4 = _fast(op, v4, f(A[i+3]))
                  end

This file has been truncated. show original

findall on a BitArray:

github.com

JuliaLang/julia/blob/48f942bdd659ae55d7d3e34dcf15dbb8bd871d71/base/bitarray.jl#L1612-L1638


      
          function findall(B::BitArray)
              nnzB = count(B)
              I = Vector{eltype(keys(B))}(undef, nnzB)
              nnzB == 0 && return I
              nnzB == length(B) && (allindices!(I, B); return I)
              Bc = B.chunks
              Bs = size(B)
              Bi = i1 = i = 1
              irest = ntuple(one, ndims(B) - 1)
              c = Bc[1]
              @inbounds while true
                  while c == 0
                      Bi == length(Bc) && return I
                      i1 += 64
                      Bi += 1
                      c = Bc[Bi]
                  end
          
                  tz = trailing_zeros(c)
                  c = _blsr(c)

This file has been truncated. show original

I would disagree with that. There are many optimization opportunities over looping in memory order and in quite a few cases Base code goes that extra step to squeeze out a bit more performance.

stillyslalom · April 19, 2020, 4:56am

The findall(::BitArray) optimization really shines here compared to the iterator-based findall, despite the need to allocate an intermediate BitArray:

julia> @btime findall(A .> 0.1) setup=(A=rand(1000))
  1.833 μs (4 allocations: 11.30 KiB)

julia> @btime findall(>(0.1), A) setup=(A=rand(1000))
  8.533 μs (14 allocations: 16.55 KiB)

It’s faster for almost every case I tested (the only exception being small arrays where few elements satisfy the condition). The advantage likely boils down to SIMD execution of the comparison, and avoiding the need to repeatedly grow/copy the output array. It’d be great for the iterator-based implementation to specialize on strided numeric arrays to take advantage of SIMD.

stillyslalom · April 20, 2020, 5:02pm

The iterator-based approach would be sped up significantly by resolving issues #32035 and #32320, but those haven’t been touched since last spring.

Topic		Replies	Views
Find indices or corresponding values except given indices New to Julia indexing	2	329	December 6, 2021
How to find the index of the maximum value of an array that meets certain conditions New to Julia	8	1037	March 23, 2023
Trying to collect indices with multiple conditions New to Julia	4	245	October 25, 2022
Findall() but returning a bool array General Usage question	3	563	August 6, 2021
Questions about vector New to Julia question , arrays	5	343	October 4, 2022

How to obtain indices of an array satisfying boolean condition

Related topics