Repeatedly multiplying a Float64 is fastest in chunks of <51 terms

I would suspect the btime is lying. (Not lying in terms of the micro-benchmark, but misrepresenting what you would see in general use)

I happen to read some of this post, just before yours:
PSA: Microbenchmarks remember branch history

I suspect that with the small code size for the very simplistic micro-benchmark of calling the same function with the same inputs the whole time is skewing what your seeing.

Granted there might be some value in the argument based on number of available registers which might make small chunks below a certain cut-off better.

I would be more inclined to believe a test where you:
create an array of 1000 acceptable inputs
random permute this list
create a function that in a for loop that runs through this list of inputs
time or even btime the function

Hopefully that should provide enough variability to prevent unrealistic tricks and give a more realistic representation of what you would see in practice when using it on different inputs for every call.

1 Like