Reducing without allocating in stdlib and `Base`

ExpandingMan · May 22, 2018, 2:52pm

Here’s a little challenge for you: given a collection A find a_{max} \in A s.t. f(a_{max}) \ge f(a) ~ \forall a \in A (in other words, find the supremum of the set f(A)).

Here’s the naive implementation:

function maxf(A, f::Function)
    amax = first(A)
    fofamax = f(amax)
    for i ∈ 2:length(A)
        fofa = f(A[i])
        if fofa ≥ fofamax
            fofamax = fofa
            amax = A[i]
        end
    end
    amax
end

The challenge is, can you find a way to do this, as efficiently as in the naive implementation, using only stdlib and Base (i.e. no control structures!)? This seems like a fair ask, since this is a pretty basic task.

This is close, but no cigar, since it returns f(a_{max}) rather than a_{max}

mapreduce(f, max, A)

(by the way, this actually has amazingly identical performance to the naive implementation)

Maybe I’m missing something, but the reason I decided to create this post was that thinking about this raised a larger question in my mind: do we have nice ways of reducing over collections without allocating? I think that mostly the answer is no. Should there be some simple, intuitive way to do this sort of thing?

I fear that the initial instinct of the vast majority of people would be to do something like last(sort(A, by=f)) which has absolutely abysmal performance due to the allocation.

fredrikekre · May 22, 2018, 3:18pm

This gives you a_\mathrm{max} and f(a_\mathrm{max}). Did not check performance though:

mapreduce(x -> (x, f(x)), (x, y) -> x[2] > y[2] ? x : y, A)

jmert · May 22, 2018, 3:19pm

My attempt:

julia> maxf(A, f::Function) = mapreduce(x -> (x, f(x)), (xf1,xf2) -> xf1[2] > xf2[2] ? xf1 : xf2, A)[1]
maxf (generic function with 1 method)

julia> A = collect(range(0.0, stop=1.0π, length=25));

julia> @time maxf(A, cos)
  0.000006 seconds (5 allocations: 176 bytes)
0.0

julia> @time maxf(A, sin)
  0.000005 seconds (5 allocations: 176 bytes)
1.5707963267948966

ExpandingMan · May 22, 2018, 3:27pm

Here are some benchmark results:

A = rand(10^6)

julia> @benchmark maxf(A, abs)  # "naive" implementation
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     2.223 ms (0.00% GC)
  median time:      2.394 ms (0.00% GC)
  mean time:        2.403 ms (0.00% GC)
  maximum time:     3.174 ms (0.00% GC)
  --------------
  samples:          2071
  evals/sample:     1


julia> @benchmark mapreduce(x -> (x, abs(x)), (x, y) -> x[2] > y[2] ? x : y, A)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.671 ms (0.00% GC)
  median time:      1.728 ms (0.00% GC)
  mean time:        1.763 ms (0.00% GC)
  maximum time:     2.104 ms (0.00% GC)
  --------------
  samples:          2820
  evals/sample:     1

So mapreduce is a bit faster, despite allocating slightly more.

It might be nice if you could do maximum(A, by=f), though that wouldn’t solve the more general problem of having blindingly obvious ways of reducing without allocating (or rather minimally allocating).

Perhaps I’m making more of that issue than it deserves…

jebej · May 22, 2018, 3:41pm

It would indeed be nice to have a function findmax(f,A), just like there is a find(f,A). I have myself also looked for such a function.

jebej · May 22, 2018, 4:51pm

Make sure to interpolate non-constants into the benchmark.

For me, the naive version is faster:

julia> maxf2(A,f) = mapreduce(x -> (x, f(x)), (x, y) -> x[2] > y[2] ? x : y, A)
maxf2 (generic function with 1 method)                                         
                                                                               
julia> using BenchmarkTools                                                    
                                                                               
julia> @benchmark maxf($A,abs)                                                 
BenchmarkTools.Trial:                                                          
  memory estimate:  0 bytes                                                    
  allocs estimate:  0                                                          
  --------------                                                               
  minimum time:     1.455 ms (0.00% GC)                                        
  median time:      1.501 ms (0.00% GC)                                        
  mean time:        1.541 ms (0.00% GC)                                        
  maximum time:     2.249 ms (0.00% GC)                                        
  --------------                                                               
  samples:          3216                                                       
  evals/sample:     1                                                          
                                                                               
julia> @benchmark maxf2($A,abs)                                                
BenchmarkTools.Trial:                                                          
  memory estimate:  0 bytes                                                    
  allocs estimate:  0                                                          
  --------------                                                               
  minimum time:     2.287 ms (0.00% GC)                                        
  median time:      2.375 ms (0.00% GC)                                        
  mean time:        2.419 ms (0.00% GC)                                        
  maximum time:     3.183 ms (0.00% GC)                                        
  --------------                                                               
  samples:          2060                                                       
  evals/sample:     1

ExpandingMan · May 22, 2018, 5:00pm

Oh, that’s confusing, I was quite aware of interpolation for benchmarking, but in that particular case I did not think that interpolating A would make any difference. Not sure I understand why it does.

Anyway, what I think I was getting at with this post is that it might be cool to have something like

@noallocate last(sort(A, by=f))

for cases where you just want to reduce over something that would otherwise be allocated. This is probably a terrible example, since it is abundantly clear that sort should return an actual array, and having a macro to elide explicit allocations would just be silly. I also don’t have any good ideas yet for what such a macro would look like. Still, I’m having trouble thinking of lots more examples where this sort of thing is a potential problem, so maybe I’m being silly. It was really bothering me that I couldn’t immediately think of an elegant way of doing this simple algorithm using only Base though, again I fear that most users would just do something that allocates.

Will post if I have any bright ideas.

jebej · May 22, 2018, 5:11pm

A macro like that might be a little too magical. Since there already exist a function findmax, the only work needed is to allow passing a predicate function (with the regular findmax(A) being equivalent to findmax(identity,A)), though I haven’t looked at the internals and so am not sure how hard that would be.

ExpandingMan · May 22, 2018, 5:14pm

Like I said, it wasn’t a good example. I probably should have resisted the urge to post it.

greg_plowman · May 23, 2018, 1:04am

Using a generator is not too bad:

function maxf3(A, f::Function)
    fx, i = findmax(f(a) for a in A)
    return A[i]
end

or

function maxf4(A, f::Function)
    i = indmax(f(a) for a in A)
    return A[i]
end

EDIT: but this requires collection A to be indexable.

Topic		Replies	Views
Allocations in mapreduce/sum etc Performance	2	1048	August 1, 2019
What's the best way to optimize findmax with a function? Internals & Design performance	8	1086	May 17, 2020
Extend the domain of maximum and minimum to include empty collection? Internals & Design question , proposal	24	3830	December 22, 2019
Array reductions (sum, mean, etc.) and abstract functions Internals & Design indexing , array	5	1433	July 18, 2017
How to obtain x instead of f(x) from maximum/minimum(f, itr) General Usage	11	126	September 16, 2024

Reducing without allocating in stdlib and `Base`

Related topics