Doesn’t Mf.L first have to copy the triangular factor into a newly allocated matrix? This has O(n^2) complexity similar to the solve itself, so it is quite a significant cost.
When you interpolate $(Mf.L) into @btime, you are precomputing this copy operation, removing it from the benchmark time.