How to obtain indices of an array satisfying boolean condition

I am trying to find the optimum way of obtaining the indices of an array which satisfy a certain boolean condition. For example, if a is of type Array{Float64,1}, i wish to obtain an array of indices i for which a[i]>0. This task can be easily accomplished by using a loop, but that would not be optimal.

I have looked extensively in the documentation for the Base class, but didnt find something that achieves what I am aiming for. There are some nice functions such as filter(f, a::AbstractArray) which does not return the indices but the elements at the indices.

1 Like

a[a .> 0] ?

1 Like
julia> A = randn(10)
10-element Array{Float64,1}:
 -0.10597385263640592
  0.6804124752504479 
 -0.745552871357189  
 -0.8316766727532244 
 -0.4136441092819935 
 -1.7795858830337792 
 -0.7049887890183149 
  1.495347684116737  
 -0.05797357451997173
  0.1689282220877404 

julia> findall(>(0), A)
3-element Array{Int64,1}:
  2
  8
 10
5 Likes

A loop would (presumably) be optimal for the computer, but I agree that less typing should be required, and hence it’s not necessarily optimal for the human.

Although you probably would have spent less time writing the loop than looking in the docs and writing this post!

3 Likes

I have read that Julia’s inline commands such as findall are highly optimized and always recommended over explicit loops.

1 Like

This still does not give the indices. But it was useful knowing about this notation. Thank you.

Thank you for suggesting findall. It worked very well. However, I think there is a slight mistake in your notation, since I get

 findall(>(0), a)
ERROR: MethodError: no method matching >(::Int64)
Closest candidates are:
  >(::Any, ::Any) at operators.jl:286
  >(::BigFloat, ::BigFloat) at mpfr.jl:756
Stacktrace:
 [1] top-level scope at none:0

However, the following syntax works.

 findall(i->(i>0.5), a)
``

I misread your question, sorry.

Single-argument comparisons were added in v1.2, so you must be using an older release. See https://github.com/JuliaLang/julia/blob/master/HISTORY.md#new-library-functions-2

1 Like

I see ! Thanks

I am curious where you read this. Loops are very efficient in Julia when written right (eg using eachindex etc).

4 Likes
julia> v = randn(1000);

julia> function myfindall(condition, x)
       results = Int[]

       for i in 1:length(x)
           if condition(x[i])
               push!(results, i)
           end
       end
       return results
       end
myfindall (generic function with 1 method)

julia> @btime findall(x -> x > 0, $v);
  3.438 μs (13 allocations: 8.47 KiB)

julia> @btime myfindall(x -> x > 0, $v);
  3.707 μs (9 allocations: 8.33 KiB)
3 Likes

While true, the Base implementations might use tricks and optimizations that are not always obvious which often make them faster than the naive for loop.

6 Likes

But the Base implementations aren’t ‘privileged’. With some experience anyone could implement their own version that matches Base, or does even better if they have special knowledge of the specific problem.

So, @iamsuddhasattwa, you shouldn’t dismiss the possibility of writing your own implementation, though findall may be what you were looking for this time.

2 Likes

Here’s an example of how you can beat the Base implementation of findall for a special case. Let’s say, for example, that you know that most of the elements in your array satisfy your condition. Then it can be more efficient to allocate a large output array in the first place rather than growing it dynamically, like findall does:

function myfindall(p, X)
    out = Vector{Int}(undef, length(X))
    ind = 0
    @inbounds for (i, x) in pairs(X)
        if p(x)
            out[ind+=1] = i
        end
    end
    resize!(out, ind)
    return out
end

There’s nothing magical going on, just ordinary code you can write yourself. Now let’s benchmark:

julia> @benchmark findall(>(0.1), r) setup=(r=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  16.55 KiB
  allocs estimate:  14
  --------------
  minimum time:     8.982 μs (0.00% GC)
  median time:      10.557 μs (0.00% GC)
  mean time:        13.389 μs (17.23% GC)
  maximum time:     8.171 ms (99.39% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark myfindall(>(0.1), r) setup=(r=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     746.443 ns (0.00% GC)
  median time:      1.241 μs (0.00% GC)
  mean time:        2.408 μs (48.33% GC)
  maximum time:     73.695 μs (98.12% GC)
  --------------
  samples:          10000
  evals/sample:     122

So, my homemade solution beats the builtin findall by a factor of 10 in this special case. (Edit: I moved the @inbounds for an even bigger speedup.)

For the record, here is the Base implementation of findall:

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

See how simple it is, almost trivial. It’s actually amazing.

5 Likes

I am not sure what you consider a “naive” loop, but most loops which just traverse something in a memory-friendly order are usually on par with what Base does when the performance tips are kept in mind. For the rest, yes, some tricks can be necessary. The majority of the constructs that make Base implementations fast are exposed.

I would say that loops are usually one of the fastest solution in Julia, and the only reason not to apply them is code clarity: more abstract code is usually easier to understand.

5 Likes

Thanks! That was very eye opening.

Just to give some examples of what I mean:

minimum:

findall on a BitArray:

I would disagree with that. There are many optimization opportunities over looping in memory order and in quite a few cases Base code goes that extra step to squeeze out a bit more performance.

4 Likes

The findall(::BitArray) optimization really shines here compared to the iterator-based findall, despite the need to allocate an intermediate BitArray:

julia> @btime findall(A .> 0.1) setup=(A=rand(1000))
  1.833 μs (4 allocations: 11.30 KiB)

julia> @btime findall(>(0.1), A) setup=(A=rand(1000))
  8.533 μs (14 allocations: 16.55 KiB)

It’s faster for almost every case I tested (the only exception being small arrays where few elements satisfy the condition). The advantage likely boils down to SIMD execution of the comparison, and avoiding the need to repeatedly grow/copy the output array. It’d be great for the iterator-based implementation to specialize on strided numeric arrays to take advantage of SIMD.

1 Like

The iterator-based approach would be sped up significantly by resolving issues #32035 and #32320, but those haven’t been touched since last spring.

1 Like