Understanding `Base.findall` and customizing it

Leo_I · January 26, 2025, 8:20am

Given a vector ww, I wish to obtain all its nonzero values and the corresponding positions (with specified type). I solve this with:

using BenchmarkTools, Test
function _find0(::tp, ww::AbstractVector{tw}) ::Tuple{Vector{tp},Vector{tw}}  where {tp<:Integer, tw}  
    pp = findall(!iszero, ww);   
    return tp.(pp), ww[pp] end;
O=UInt32(0);   ww=rand(0.0:1.0,10^5);   @btime _find0($O,$ww);
  154.408 μs (13 allocations: 989.91 KiB)

I’d like to squeeze every last drop of performance out of this, since I execute it in a long loop and it is a bottleneck for my application. Inspired by

github.com/JuliaLang/julia

base/array.jl

6cd750ddf


      
            :B => -1
            :C => 0
          
          julia> findall(≥(0), d)
          2-element Vector{Symbol}:
           :A
           :C
          
          ```
          """
          function findall(testf::Function, A)
              gen = (first(p) for p in pairs(A) if testf(last(p)))
              @default_eltype(gen) === Union{} ? collect(@default_eltype(keys(A)), gen) : collect(gen)
          end
          
          # Broadcasting is much faster for small testf, and computing
          # integer indices from logical index using findall has a negligible cost
          findall(testf::F, A::AbstractArray) where {F<:Function} = findall(testf.(A))
          
          """
              findall(A)

my attempt at improvement is:

function _find1(::tp, ww::AbstractVector{tw}) ::Tuple{Vector{tp},Vector{tw}}  where {tp<:Integer, tw}  
    pp = collect(tp, (first(p) for p ∈ pairs(ww) if !iszero(last(p))));   
    return pp, ww[pp] end;
O=UInt32(0);   ww=rand(0.0:1.0,10^5);   @btime _find1($O,$ww);
  698.595 μs (19 allocations: 797.38 KiB)

Why such a performance dip? Both functions seem type-stable:
@inferred _find0(O,ww);
@inferred _find1(O,ww);

Even stranger, if I copy-paste findall from Julia Base into my script as _findall and use it in my _find0, I get the same bad performance. What is going on here?

Uve · January 26, 2025, 12:49pm

Paying attention to the comments from the ‘findall’ function you linked:

# Broadcasting is much faster for small testf, and computing.
# integer indices from logical index using findall has a negligible cost
findall(testf::F, A::AbstractArray) where {F<:Function} = findall(testf.(A))

Maybe that’s where the difference in execution times lies.

In my case, this version, using the indications of the above-mentioned commentary, offers a slight improvement. This is what makes me think that this may be the reason why you find differences in performance.

using BenchmarkTools

function _find0(::tp, ww::AbstractVector{tw}) ::Tuple{Vector{tp},Vector{tw}}  where {tp<:Integer, tw}  
    pp = findall(!iszero, ww);   
    return tp.(pp), ww[pp] end;

function _find2(tp, ww)
    typep = typeof(tp)
    x = broadcast(!iszero,ww)
    res = ww[x]
    pos = (typep(1):typep(length(ww)))[x]
    return (pos, res)
end

julia> O=UInt32(0);   ww=rand(0.0:1.0,10^5);

julia> res0 = _find0(O,ww);

julia> res2 = _find2(O,ww);

julia> res0 == res2
true

julia> typeof(res0)
Tuple{Vector{UInt32}, Vector{Float64}}

julia> typeof(res2)
Tuple{Vector{UInt32}, Vector{Float64}}

julia> @benchmark _find0($O, $ww)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  110.400 μs …  1.509 ms  ┊ GC (min … max): 0.00% … 81.08%
 Time  (median):     120.000 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   132.035 μs ± 99.278 μs  ┊ GC (mean ± σ):  6.04% ±  7.40%

   █▆▃
  ▄███▆▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▁▁▂▂▁▂▂▁▁▁▂▁▂▂▂▂▂▂▂▂▁▁▂▂▂▂▂▂ ▂
  110 μs          Histogram: frequency by time          337 μs <

 Memory estimate: 993.48 KiB, allocs estimate: 9.

@benchmark _find2($O, $ww)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   90.200 μs …  1.300 ms  ┊ GC (min … max): 0.00% … 89.55%
 Time  (median):      94.800 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   100.735 μs ± 76.972 μs  ┊ GC (mean ± σ):  4.95% ±  5.97%

       ▃▁▆█▄█▇▂▄▂
  ▁▂▂▃▆██████████▇██▆▇▇▆▇▆▅▅▄▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  90.2 μs         Histogram: frequency by time          112 μs <

 Memory estimate: 602.75 KiB, allocs estimate: 7.

Leo_I · January 26, 2025, 1:24pm

Hmm, it’s super strange. Yesterday, when playing around with _find1, I achieved 2x faster time and fewer allocations than with _find0, so I know your solution could be improved. But I don’t know what exactly I changed (if anything), and I can’t recover that fast solution. : (

Topic		Replies	Views
Performance of variants of findall Performance	15	1462	February 28, 2023
Findall slow General Usage	8	1913	October 24, 2019
Is there a non-allocating version of findall? Performance indexing , memory-allocation	1	127	October 2, 2024
Find position on list (fast) General Usage	13	775	November 22, 2021
Specialized Base.findfirst for ranges Performance proposal	2	402	September 6, 2019

Understanding `Base.findall` and customizing it

Related topics