Yes, this seems to be it. So @benchmark
does not seem to be a good way to evaluate the performance of a fixed point algorithm like mine. Writing a wrapper function of the following form (passing the other parameters):
function evaluate(δ_in, Xβ, params)
δ_out = copy(δ_in)
invert_shares_δ!(δ_out, Xβ, params)
return δ_out
end
Gives the “correct” benchmark performance:
BenchmarkTools.Trial: 151 samples with 1 evaluation.
Range (min … max): 18.512 ms … 378.933 ms ┊ GC (min … max): 0.00% … 90.32%
Time (median): 20.174 ms ┊ GC (median): 0.00%
Time (mean ± σ): 34.223 ms ± 68.247 ms ┊ GC (mean ± σ): 39.90% ± 18.29%
█▁
██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▄▅ ▄
18.5 ms Histogram: log(frequency) by time 372 ms <
Memory estimate: 307.06 MiB, allocs estimate: 18968.
This is roughly half of the time than running the code twice together (see my benchmark estimates above). So it makes sense.
Sad news—my code is slower than I thought! I guess I will have to make it faster.