Should `literal_pow` optimize for bigger exponents?

prod(ntuple(...)) is a very suboptimal way to compute large powers — you want to use repeated squaring, or more generally an optimal addition chain. This is implemented in:

but it isn’t the default for literal_pow because it is slightly less accurate (for floating-point types).

LLVM only does this by default for integer types; for floating-point types you have to use @fastmath because it changes (worsens) the roundoff errors. The FastPow package extends this to other types beyond the small set of built-in types supported by LLVM.

3 Likes