looks like a fun experiment!
you may be interested in this thread: Performance challenge: can you write a faster sum?
and there are lots of benchmarks in the associated PR WIP: The great pairwise reduction refactor by mbauman · Pull Request #58418 · JuliaLang/julia · GitHub
the most challenging part here is getting (as) uniform (as possible) speedups across all array types, shapes, sizes, element types, computer architectures, etc.
for example, a change to mapreduce that makes it 20% faster on Array{Float64} might accidentally cause 10x regressions on a ReshapedArray{BigInt, 2, SubArray{...}} (not particularly that type, just made something up for dramatic effect)