Reducing without allocating in stdlib and `Base`


#1

Here’s a little challenge for you: given a collection A find a_{max} \in A s.t. f(a_{max}) \ge f(a) ~ \forall a \in A (in other words, find the supremum of the set f(A)).

Here’s the naive implementation:

function maxf(A, f::Function)
    amax = first(A)
    fofamax = f(amax)
    for i ∈ 2:length(A)
        fofa = f(A[i])
        if fofa ≥ fofamax
            fofamax = fofa
            amax = A[i]
        end
    end
    amax
end

The challenge is, can you find a way to do this, as efficiently as in the naive implementation, using only stdlib and Base (i.e. no control structures!)? This seems like a fair ask, since this is a pretty basic task.

This is close, but no cigar, since it returns f(a_{max}) rather than a_{max}

mapreduce(f, max, A)

(by the way, this actually has amazingly identical performance to the naive implementation)

Maybe I’m missing something, but the reason I decided to create this post was that thinking about this raised a larger question in my mind: do we have nice ways of reducing over collections without allocating? I think that mostly the answer is no. Should there be some simple, intuitive way to do this sort of thing?

I fear that the initial instinct of the vast majority of people would be to do something like last(sort(A, by=f)) which has absolutely abysmal performance due to the allocation.


#2

This gives you a_\mathrm{max} and f(a_\mathrm{max}). Did not check performance though:

mapreduce(x -> (x, f(x)), (x, y) -> x[2] > y[2] ? x : y, A)

#3

My attempt:

julia> maxf(A, f::Function) = mapreduce(x -> (x, f(x)), (xf1,xf2) -> xf1[2] > xf2[2] ? xf1 : xf2, A)[1]
maxf (generic function with 1 method)

julia> A = collect(range(0.0, stop=1.0π, length=25));

julia> @time maxf(A, cos)
  0.000006 seconds (5 allocations: 176 bytes)
0.0

julia> @time maxf(A, sin)
  0.000005 seconds (5 allocations: 176 bytes)
1.5707963267948966

#4

Here are some benchmark results:

A = rand(10^6)

julia> @benchmark maxf(A, abs)  # "naive" implementation
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     2.223 ms (0.00% GC)
  median time:      2.394 ms (0.00% GC)
  mean time:        2.403 ms (0.00% GC)
  maximum time:     3.174 ms (0.00% GC)
  --------------
  samples:          2071
  evals/sample:     1


julia> @benchmark mapreduce(x -> (x, abs(x)), (x, y) -> x[2] > y[2] ? x : y, A)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.671 ms (0.00% GC)
  median time:      1.728 ms (0.00% GC)
  mean time:        1.763 ms (0.00% GC)
  maximum time:     2.104 ms (0.00% GC)
  --------------
  samples:          2820
  evals/sample:     1

So mapreduce is a bit faster, despite allocating slightly more.

It might be nice if you could do maximum(A, by=f), though that wouldn’t solve the more general problem of having blindingly obvious ways of reducing without allocating (or rather minimally allocating).

Perhaps I’m making more of that issue than it deserves…


#5

It would indeed be nice to have a function findmax(f,A), just like there is a find(f,A). I have myself also looked for such a function.


#6

Make sure to interpolate non-constants into the benchmark.

For me, the naive version is faster:

julia> maxf2(A,f) = mapreduce(x -> (x, f(x)), (x, y) -> x[2] > y[2] ? x : y, A)
maxf2 (generic function with 1 method)                                         
                                                                               
julia> using BenchmarkTools                                                    
                                                                               
julia> @benchmark maxf($A,abs)                                                 
BenchmarkTools.Trial:                                                          
  memory estimate:  0 bytes                                                    
  allocs estimate:  0                                                          
  --------------                                                               
  minimum time:     1.455 ms (0.00% GC)                                        
  median time:      1.501 ms (0.00% GC)                                        
  mean time:        1.541 ms (0.00% GC)                                        
  maximum time:     2.249 ms (0.00% GC)                                        
  --------------                                                               
  samples:          3216                                                       
  evals/sample:     1                                                          
                                                                               
julia> @benchmark maxf2($A,abs)                                                
BenchmarkTools.Trial:                                                          
  memory estimate:  0 bytes                                                    
  allocs estimate:  0                                                          
  --------------                                                               
  minimum time:     2.287 ms (0.00% GC)                                        
  median time:      2.375 ms (0.00% GC)                                        
  mean time:        2.419 ms (0.00% GC)                                        
  maximum time:     3.183 ms (0.00% GC)                                        
  --------------                                                               
  samples:          2060                                                       
  evals/sample:     1                                                          

#8

Oh, that’s confusing, I was quite aware of interpolation for benchmarking, but in that particular case I did not think that interpolating A would make any difference. Not sure I understand why it does.

Anyway, what I think I was getting at with this post is that it might be cool to have something like

@noallocate last(sort(A, by=f))

for cases where you just want to reduce over something that would otherwise be allocated. This is probably a terrible example, since it is abundantly clear that sort should return an actual array, and having a macro to elide explicit allocations would just be silly. I also don’t have any good ideas yet for what such a macro would look like. Still, I’m having trouble thinking of lots more examples where this sort of thing is a potential problem, so maybe I’m being silly. It was really bothering me that I couldn’t immediately think of an elegant way of doing this simple algorithm using only Base though, again I fear that most users would just do something that allocates.

Will post if I have any bright ideas.


#9

A macro like that might be a little too magical. Since there already exist a function findmax, the only work needed is to allow passing a predicate function (with the regular findmax(A) being equivalent to findmax(identity,A)), though I haven’t looked at the internals and so am not sure how hard that would be.


#10

Like I said, it wasn’t a good example. I probably should have resisted the urge to post it.


#11

Using a generator is not too bad:

function maxf3(A, f::Function)
    fx, i = findmax(f(a) for a in A)
    return A[i]
end

or

function maxf4(A, f::Function)
    i = indmax(f(a) for a in A)
    return A[i]
end

EDIT: but this requires collection A to be indexable.