No, because even if you get to Int64
, and use that subsequently it will be checked (unlike for status quo), but yes, you want to avoid going that far for safety AND performance reasons (with Int21
you have a budget of three multiplies before slowdown, or more as explained with my polynomial example, or 43 additions). But no matter the size of the resulting type you end up with, it will be checked. I just explained, we could go to Int128
next, but likely you want to go to BigInt
directly (and possibly it will have a fast path for values that would have fit into Int64
or Int128
).
Yes, it will be faster because there will be no check (as long as the result fits in one register), i.e. in the common case (for Int63 inputs to addition, and seemingly Int64
too; for multiply of Int64
with Base.checked_mul
it seems much worse than my idea), and you get a three-byte assembly code with my scheme, just an add
(or addq
or similar) which is as fast as non-checked 32-bit add (which seems to be shorter code than 64-bit add that gets me “lea rax, [rdi + rsi]
”.
The kicker is it will work with SIMD code too (and provide 2x memory bandwidth), except for all other overflow check schemes:
No flags are touched by the underlying PADDD instruction.
So to test this, you have to write additional code, depending on what you want to do.
There’s some reason they’re not the default, and as you show the code expansion is huge. Non-taken branches are cheap, I can believe 15% overhead or less, that statistic seems rather high if the branches were the only added code, but the bad thing about those functions are the huge L1 Dcache pollution, and also most people don’t use or know of them. Also 1 letter operators much better… or 0 letters as in 2x
.