Avoid calling push! in Base.find

I don’t understand why the actual find function fills a 0 length array using push! to finally create other array and copy data to it. However, I write this version which avoid calling push! and it’s faster than the actual implementation. Should I make a PR with this implementation? Best,

julia> using BenchmarkTools

julia> function find2(testf::Function, A)
           len = 0 
           inds = Base._index_remapper(A)
           out = Array(Int, length(inds))
           for (i,a) = enumerate(A)
               if testf(a)
                   len += 1
                   out[len] = inds[i]
               end
           end
           resize!(out, len)
           return I
       end
find2 (generic function with 1 method)

julia> @benchmark find(x -> x > 0.5, rand(10000))
BenchmarkTools.Trial: 
  memory estimate:  244.61 kb
  allocs estimate:  17
  --------------
  minimum time:     130.360 μs (0.00% GC)
  median time:      143.276 μs (0.00% GC)
  mean time:        152.181 μs (3.08% GC)
  maximum time:     694.662 μs (76.48% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark find2(x -> x > 0.5, rand(10000))
BenchmarkTools.Trial: 
  memory estimate:  156.41 kb
  allocs estimate:  4
  --------------
  minimum time:     79.431 μs (0.00% GC)
  median time:      83.703 μs (0.00% GC)
  mean time:        89.274 μs (2.98% GC)
  maximum time:     558.110 μs (72.61% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

This implementation pre-allocates an index array of the same size as the whole input array. The Base.find function only allocates storage proportional to the size of its output.

Of course, in cases like your test case where the output array is half the size of the input array, this doesn’t save any memory, but in cases where the find output is much smaller it makes a difference.

Note that when you benchmark, you should interpolate the arguments of the function you are benchmarking, for example:

julia> A = rand(10^6);

julia> @benchmark find(x -> x < 0.01, $A)
BenchmarkTools.Trial: 
  memory estimate:  334.03 kb
  allocs estimate:  16
  --------------
  minimum time:     931.427 μs (0.00% GC)
  median time:      975.280 μs (0.00% GC)
  mean time:        1.004 ms (1.67% GC)
  maximum time:     2.772 ms (50.97% GC)
  --------------
  samples:          4975
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark find2(x -> x < 0.01, $A)
BenchmarkTools.Trial: 
  memory estimate:  7.63 mb
  allocs estimate:  2
  --------------
  minimum time:     870.728 μs (0.00% GC)
  median time:      926.463 μs (0.00% GC)
  mean time:        1.021 ms (11.63% GC)
  maximum time:     2.344 ms (27.54% GC)
  --------------
  samples:          4890
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

In this case, there is only a 7% performance penalty to the Base.find strategy of incremental allocation of the output, and it saves a factor of 20 in memory allocation.

Yes, It’s true! find2 is only fine when there are a lot of true elements or for short arrays where allocations aren’t a problem. Thanks!