I’m seeing a weird performance issue related to the presence of an if-statement in my function, regardless of the if-statement being triggered or not. I can only reproduce this in 1.9.0
, not earlier Julia versions.
I have a function called tree_mapreduce
which has two different modes of behavior configured by the preserve_sharing
flag:
function tree_mapreduce(
args...;
preserve_sharing::Bool=false,
result_type::Type{RT}=Nothing,
)
if preserve_sharing && result_type != Nothing
return _tree_mapreduce(args..., IdDict{Node,RT}())
else
return _tree_mapreduce(args...)
end
end
(simplified version; here is the full implementation)
This function is used for copies of my Node{T}
type in the copy_node
function. When it has an extra argument, it has slightly different behavior. With this implementation, I get the following benchmark times:
using DynamicExpressions
# Create expression tree
operators = OperatorEnum(; binary_operators=[+, -, *, /], unary_operators=[cos, sin])
x1, x2, x3 = (i -> Node(Float64; feature=i).(1:3)
tree = cos(x1 * 3.2 - 5.8) * 0.2 - 0.5 * x2 * x3 * x3 + 0.9 / (x1 * x1 + 1)
# Benchmark:
@btime copy_node(tree; preserve_sharing=false)
This gives me 682.752 ns (26 allocations: 2.02 KiB).
Note that preserve_sharing
is set to false.
However, when I simply change the function to be
function tree_mapreduce(
f_leaf::F1,
f_branch::F2,
op::G,
tree::N;
preserve_sharing::Bool=false,
result_type::Type{RT}=Nothing,
) where {T,N<:Node{T},F1<:Function,F2<:Function,G<:Function,RT}
#if preserve_sharing
# return _tree_mapreduce(f_leaf, f_branch, op, tree, IdDict{N,RT}())
#else
return _tree_mapreduce(f_leaf, f_branch, op, tree)
#end
end
and re-run the benchmark, I obtain the result 285.015 ns (24 allocations: 1.88 KiB)
Why is the simple presence of that if-statement harming performance so dramatically? I note that for preserve_sharing=true
, the performance is significantly higher, so I don’t think that branch is actually being executed.
Additional notes:
- I ran the above benchmarks with
-O3
. But running without seems to make the second benchmark slightly worse, and thus the times slightly closer together in performance. - I ran the benchmarks with the return type annotated on
tree_mapreduce
, which didn’t seem to change the performance. - On Julia 1.8.5, the timings are nearly identical.
- Another weird thing I noticed is that if I am using Revise.jl, and make this edit, and then go back, the time of about ~700 ns will go down to ~460 ns, even though it’s the identical function. Even if I call the function many many times in between. It’s almost like the compilation is improved by Revising the function multiple times.
Edit: added the full function signature as it appears relevant