Running small functions bundled in an outer function taking twice the time of running them separately

I suspect it has to do with some of the data fitting into a CPU cache with the individual benchmarks, but only a higher level cache for the combined ones. I can replicate the phenomenon on 1.4, with multiple runs giving somewhat inconsistent timings.

In any case, making x and v 10x larger resolves the inconsistency for me (you may have to increase it more if you have a recent desktop CPU, mine is a puny laptop CPU with little cache) eg

julia> @btime toPolar!($x)
  4.520 ms (0 allocations: 0 bytes)

julia> @btime toCartesian!($x)
  1.284 ms (0 allocations: 0 bytes)

julia> @btime move!($x, $v, $T)
  148.501 μs (0 allocations: 0 bytes)

julia> @btime outerFunction!($x, $v, $T)
  5.503 ms (0 allocations: 0 bytes)

what the relevant benchmark is depends on your data size I guess.

Also, cf

5 Likes