Interestingly a simplified benchmark does not bear this out (was going
to file an issue on the Julia repo). Although my total runtime is
shorter with the untyped fields in the benchmarks game benchmark,
traversing the tree recursively seems to be slower when replicated
with untyped fields:
julia> struct NodeA
l::Union{NodeA,Nothing}
r::Union{NodeA,Nothing}
end
julia> struct NodeB
l
r
end
julia> make(n, T) = n == 1 ? T(nothing, nothing) : T(make(n-1, T), make(n-1, T))
make (generic function with 1 method)
julia> countnodes(node) = node === nothing ? 1 : 1 + countnodes(node.l) + countnodes(node.r)
countnodes (generic function with 1 method)
julia> using BenchmarkTools
julia> @benchmark make(21, NodeA)
BenchmarkTools.Trial:
memory estimate: 64.00 MiB
allocs estimate: 2097151
--------------
minimum time: 349.933 ms (0.00% GC)
median time: 358.420 ms (0.00% GC)
mean time: 437.984 ms (19.46% GC)
maximum time: 688.123 ms (51.11% GC)
--------------
samples: 12
evals/sample: 1
julia> @benchmark make(21, NodeB)
BenchmarkTools.Trial:
memory estimate: 64.00 MiB
allocs estimate: 2097151
--------------
minimum time: 389.716 ms (0.00% GC)
median time: 394.468 ms (0.00% GC)
mean time: 475.543 ms (17.67% GC)
maximum time: 725.103 ms (48.57% GC)
--------------
samples: 12
evals/sample: 1
julia> tree_a = make(21, NodeA);
julia> tree_b = make(21, NodeB);
julia> @benchmark countnodes($tree_a)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 24.632 ms (0.00% GC)
median time: 25.061 ms (0.00% GC)
mean time: 25.345 ms (0.00% GC)
maximum time: 33.658 ms (0.00% GC)
--------------
samples: 198
evals/sample: 1
julia> @benchmark countnodes($tree_b)
BenchmarkTools.Trial:
memory estimate: 127.97 KiB
allocs estimate: 8190
--------------
minimum time: 95.624 ms (0.00% GC)
median time: 97.027 ms (0.00% GC)
mean time: 98.077 ms (0.00% GC)
maximum time: 109.683 ms (0.00% GC)
--------------
samples: 52
evals/sample: 1
No real detectable difference in time to construct the tree
though. This result is consistent with even larger trees:
julia> tree_a = make(25, NodeA);
julia> tree_b = make(25, NodeB);
julia> @benchmark countnodes($tree_a)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 416.255 ms (0.00% GC)
median time: 428.381 ms (0.00% GC)
mean time: 436.403 ms (0.00% GC)
maximum time: 498.951 ms (0.00% GC)
--------------
samples: 12
evals/sample: 1
julia> @benchmark countnodes($tree_b)
BenchmarkTools.Trial:
memory estimate: 2.00 MiB
allocs estimate: 131070
--------------
minimum time: 1.585 s (0.00% GC)
median time: 1.592 s (0.00% GC)
mean time: 1.662 s (0.00% GC)
maximum time: 1.881 s (0.00% GC)
--------------
samples: 4
evals/sample: 1
And just for the heck of it, smaller trees:
julia> tree_b = make(12, NodeB);
julia> tree_a = make(12, NodeA);
julia> @benchmark countnodes($tree_a)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 14.210 μs (0.00% GC)
median time: 14.282 μs (0.00% GC)
mean time: 14.486 μs (0.00% GC)
maximum time: 54.675 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark countnodes($tree_b)
BenchmarkTools.Trial:
memory estimate: 224 bytes
allocs estimate: 14
--------------
minimum time: 138.993 μs (0.00% GC)
median time: 139.330 μs (0.00% GC)
mean time: 155.135 μs (0.00% GC)
maximum time: 553.816 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
I’m getting quite confused about where the observed speedup is coming
from. Maybe there’s some other difference I’m not seeing in the code I
submitted above compared to the earlier submissions.
EDIT: Aaand it turns out the type annotations were a red herring. With
nothing else changed but adding the type annotations back in, I
saw the following when benchmarking the whole program. The type
annotations improve rather than degrade performance; as expected.
❮ hyperfine --warmup 1 'julia -p 3 -O3 binary-trees-ab-untyped-struct.jl 21 > myoutput21.txt'
Benchmark #1: julia -p 3 -O3 binary-trees-ab-untyped-struct.jl 21 > myoutput21.txt
Time (mean ± σ): 19.054 s ± 3.582 s [User: 37.494 s, System: 1.082 s]
Range (min … max): 17.351 s … 29.153 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
~/repos/benchmark-game/binary-trees master* ⇡ 3m 29s
❯ hyperfine --warmup 1 'julia -p 3 -O3 binary-trees-ab-retyped-struct.jl 21 > myoutput21.txt'
Benchmark #1: julia -p 3 -O3 binary-trees-ab-retyped-struct.jl 21 > myoutput21.txt
Time (mean ± σ): 13.234 s ± 0.758 s [User: 28.711 s, System: 1.082 s]
Range (min … max): 12.736 s … 15.224 s 10 runs
Something else I changed caused the speedup; not the untyped fields.