Mapreduce slower than sum

I am comparing three approaches for a point-wise multiplication of a vector field and a scalar field, followed by a summation over all elements of the resulting vector field. I would like to know why the mapreduce approach allocates more than both the sum and my own prodsum approaches?

I would expect the mapreduce approach to be equally performant as my own prodsum implementation and that the sum approach would be the least performant. As it turns out, my own implementation is the most efficient and the mapreduce approach is the least efficient.

using StaticArrays

function prodsum(x, y)
    v = x[1]*y[1]
    for i = 2:length(x)
        @inbounds v += x[i]*y[i]
    return v

function test()
    n = 10000
    x = rand(SVector{3, Float64}, n)
    y = rand(Float64, n)

    @time sum(x .* y)
    @time mapreduce((a, b) -> a*b, +, x, y)
    @time prodsum(x, y)


mapreduce(*, +, x, y) dispatches to:

which allocates a temporary array. Also, it seems that function arguments are not fully specialized. mapfoldl(Base.splat(*), +, zip(x, y)) seems to be as fast as the hand-coded function.

1 Like

Thanks for pointing out the dispatch and the allocation of a temporary array.

However, the performance on my machine is not the same. The mapfodl approach is consistently about 30% slower (for n = 10000) and also makes two allocations which the hand-coded function does not.