Mapreduce performance and dispatch

I’m having trouble getting mapreduce to dispatch correctly and am looking for tips. From profiling I can see that it dispatches to map and reduce independently rather than something better like mapfoldl.

There’s a similar issue here and I reuse that example:

using StaticArrays, BenchmarkTools

# hand-coded function
function prodsum(x, y; init=zeros(eltype(x)))
    v = init
    for i = 1:length(x)
        @inbounds v += x[i]*y[i]
    return v

function test()
    n = 10000
    x = rand(SVector{3, Float64}, n)
    y = rand(Float64, n)

    @btime prodsum($x, $y)                      # 9.208 μs (0 allocations: 0 bytes)
    @btime sum($x .* $y)                        # 9.708 μs (2 allocations: 234.42 KiB)
    @btime mapreduce(*, +, $x, $y)              # 11.000 μs (6 allocations: 234.55 KiB)
    @btime mapfoldl(splat(*), +, zip($x, $y))   # 9.208 μs (0 allocations: 0 bytes)


How would I effectively use mapreduce in this case?

Btw, initialization produces some strange performance results:

function test_init()
    x = rand(SVector{3, Float64}, n)
    y = rand(Float64, n)
    x0 = zeros(SVector{3, Float64})

    @btime prodsum($x, $y, init=$x0)                    # 9.209 μs (0 allocations: 0 bytes)
    @btime sum($x .* $y, init=$x0)                      # 15.417 μs (2 allocations: 234.42 KiB)
    @btime mapreduce(*, +, $x, $y, init=$x0)            # 15.291 μs (2 allocations: 234.42 KiB)
    @btime mapfoldl(splat(*), +, zip($x, $y), init=$x0) # 9.208 μs (0 allocations: 0 bytes)
