Allocations and time of running the same program twice are orders of magnitude larger than running them separately

Just providing a quick update to close this. It does seem like using @benchmark to evaluate an in-place fixed point iteration function without restarting the initial values, is what generated the discrepancy in the benchmark results.

In summary, the benchmark code in the first run takes as an initial value δ_test = copy(δ_logit) and returns the fixed point in the same array. In subsequent samples of the benchmark code, it takes the final iteration and, since that’s the fixed point, returns itself (after verifying it is indeed a fixed point). If you think really hard about it, this is expected behavior.

The only way to avoid this is to explicitly restart the function each time, as my wrapper function evaluate in this comment. I have modified my code substantially, to avoid all the cumbersome views, and can confirm this is the case. I managed to decrease allocations by pre-allocating temporary arrays, making my code type stable, and passing all elements into the function.

Thank you all for all your help! Your suggestions improved the code speed by 10x (ish!).