Mapreduce slower than sum

CFBaptista · March 16, 2020, 4:11pm

I am comparing three approaches for a point-wise multiplication of a vector field and a scalar field, followed by a summation over all elements of the resulting vector field. I would like to know why the mapreduce approach allocates more than both the sum and my own prodsum approaches?

I would expect the mapreduce approach to be equally performant as my own prodsum implementation and that the sum approach would be the least performant. As it turns out, my own implementation is the most efficient and the mapreduce approach is the least efficient.

using StaticArrays

function prodsum(x, y)
    v = x[1]*y[1]
    for i = 2:length(x)
        @inbounds v += x[i]*y[i]
    end
    return v
end

function test()
    n = 10000
    x = rand(SVector{3, Float64}, n)
    y = rand(Float64, n)

    @time sum(x .* y)
    @time mapreduce((a, b) -> a*b, +, x, y)
    @time prodsum(x, y)
end

test();
test();

tkf · March 16, 2020, 7:52pm

mapreduce(*, +, x, y) dispatches to:

github.com

JuliaLang/julia/blob/0f1b1192735e1c05c5aa0eab85bef92250abe05c/base/reducedim.jl#L308


      
          julia> mapreduce(isodd, *, a, dims=1)
          1×4 Array{Bool,2}:
           0  0  0  0
          
          
julia> mapreduce(isodd, |, a, dims=1)
          1×4 Array{Bool,2}:
           1  1  1  1
          ```
          """
          mapreduce(f, op, A::AbstractArray; dims=:, kw...) = _mapreduce_dim(f, op, kw.data, A, dims)
          mapreduce(f, op, A::AbstractArray...; kw...) = reduce(op, map(f, A...); kw...)
          
          
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, ::Colon) = mapfoldl(f, op, A; nt...)
          
          
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, ::Colon) = _mapreduce(f, op, IndexStyle(A), A)
          
          
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, dims) =
              mapreducedim!(f, op, reducedim_initarray(A, dims, nt.init), A)
          
          
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, dims) =
              mapreducedim!(f, op, reducedim_init(f, op, A, dims), A)

which allocates a temporary array. Also, it seems that function arguments are not fully specialized. mapfoldl(Base.splat(*), +, zip(x, y)) seems to be as fast as the hand-coded function.

CFBaptista · March 17, 2020, 8:25am

Thanks for pointing out the dispatch and the allocation of a temporary array.

However, the performance on my machine is not the same. The mapfodl approach is consistently about 30% slower (for n = 10000) and also makes two allocations which the hand-coded function does not.

Topic		Replies	Views
Mapreduce performance and dispatch Performance question	0	215	July 7, 2023
Different performance between reduce(map()) and mapreduce() Performance question , performance , map	7	1580	August 2, 2022
Method of `mapreduce` with multiple arguments General Usage question	2	1302	June 3, 2018
Parallel reductions Julia at Scale	23	5172	June 19, 2022
Sum, mapreduce and broadcasted Performance broadcast	14	2935	September 23, 2018

Mapreduce slower than sum

Related topics