I am trying to find the optimum way of obtaining the indices of an array which satisfy a certain boolean condition. For example, if a is of type Array{Float64,1}, i wish to obtain an array of indices i for which a[i]>0. This task can be easily accomplished by using a loop, but that would not be optimal.
I have looked extensively in the documentation for the Base class, but didnt find something that achieves what I am aiming for. There are some nice functions such as filter(f, a::AbstractArray) which does not return the indices but the elements at the indices.
A loop would (presumably) be optimal for the computer, but I agree that less typing should be required, and hence itās not necessarily optimal for the human.
Although you probably would have spent less time writing the loop than looking in the docs and writing this post!
Thank you for suggesting findall. It worked very well. However, I think there is a slight mistake in your notation, since I get
findall(>(0), a)
ERROR: MethodError: no method matching >(::Int64)
Closest candidates are:
>(::Any, ::Any) at operators.jl:286
>(::BigFloat, ::BigFloat) at mpfr.jl:756
Stacktrace:
[1] top-level scope at none:0
julia> v = randn(1000);
julia> function myfindall(condition, x)
results = Int[]
for i in 1:length(x)
if condition(x[i])
push!(results, i)
end
end
return results
end
myfindall (generic function with 1 method)
julia> @btime findall(x -> x > 0, $v);
3.438 Ī¼s (13 allocations: 8.47 KiB)
julia> @btime myfindall(x -> x > 0, $v);
3.707 Ī¼s (9 allocations: 8.33 KiB)
While true, the Base implementations might use tricks and optimizations that are not always obvious which often make them faster than the naive for loop.
But the Base implementations arenāt āprivilegedā. With some experience anyone could implement their own version that matches Base, or does even better if they have special knowledge of the specific problem.
So, @iamsuddhasattwa, you shouldnāt dismiss the possibility of writing your own implementation, though findall may be what you were looking for this time.
Hereās an example of how you can beat the Base implementation of findall for a special case. Letās say, for example, that you know that most of the elements in your array satisfy your condition. Then it can be more efficient to allocate a large output array in the first place rather than growing it dynamically, like findall does:
function myfindall(p, X)
out = Vector{Int}(undef, length(X))
ind = 0
@inbounds for (i, x) in pairs(X)
if p(x)
out[ind+=1] = i
end
end
resize!(out, ind)
return out
end
Thereās nothing magical going on, just ordinary code you can write yourself. Now letās benchmark:
I am not sure what you consider a ānaiveā loop, but most loops which just traverse something in a memory-friendly order are usually on par with what Base does when the performance tips are kept in mind. For the rest, yes, some tricks can be necessary. The majority of the constructs that make Base implementations fast are exposed.
I would say that loops are usually one of the fastest solution in Julia, and the only reason not to apply them is code clarity: more abstract code is usually easier to understand.
I would disagree with that. There are many optimization opportunities over looping in memory order and in quite a few cases Base code goes that extra step to squeeze out a bit more performance.
Itās faster for almost every case I tested (the only exception being small arrays where few elements satisfy the condition). The advantage likely boils down to SIMD execution of the comparison, and avoiding the need to repeatedly grow/copy the output array. Itād be great for the iterator-based implementation to specialize on strided numeric arrays to take advantage of SIMD.