Optimizing function barrier for mapreduce

I have a function that counts the number of nodes in a binary tree:

struct Node
    degree::Int
    l::Node
    r::Node  # possibly undefined; assume degree is correct
    Node(::Int) = new(0)
    Node(::Int, l::Node) = new(1, l)
    Node(::Int, l::Node, r::Node) = new(2, l, r)
end

function count_nodes(tree::Node)::Int
    if tree.degree == 0
        return 1
    elseif tree.degree == 1
        return 1 + count_nodes(tree.l)
    else
        return 1 + count_nodes(tree.l) + count_nodes(tree.r)
    end
end

I am trying to write a generic version of this as a map reduce, to aggregate any sort of property. Here is my current attempt:

function tree_mapreduce(f::F, op::G, tree::Node) where {F<:Function,G<:Function}
    if tree.degree == 0
        return f(tree)
    elseif tree.degree == 1
        return op(f(tree), tree_mapreduce(f, op, tree.l))
    else
        return op(f(tree), tree_mapreduce(f, op, tree.l), tree_mapreduce(f, op, tree.r))
    end
end

This is a good start, but it’s not entirely the same speed:

# example tree:
t = Node(2, Node(1, Node(0)), Node(0))
for _=1:4
    t = Node(2, Node(1, t), Node(1, t))
end

@btime count_nodes(t)  # 385 ns
@btime tree_mapreduce(_ -> 1, +, t)  # 418 ns

any idea why it’s not getting the same performance?

Perhaps it can’t infer the return type correctly?

I get the exact same perfrormance for both on my laptop w/Julia 1.9:

julia> @btime count_nodes(t)
  201.108 ns (0 allocations: 0 bytes)
109

julia> @btime tree_mapreduce(_ -> 1, +, t)
  198.660 ns (0 allocations: 0 bytes)
109

Thanks for the sanity check. It seems like I just forgot to turn on -O3 or something, I have no idea went wrong. Indeed looks like they are the same speed.

Cheers!
Miles