Potential solution to the overflow problem; 64-bit considered harmful, 32- or 21-bit better

Functions are not a problem, but loops are… I’m explain under B. a partial or full solution. And you don’t need to know or calculate the bound yourself, Julia gives you the types, you would only need maybe a debug mode to see them or a linter (the overflow/safety problem turns into an optimization problem compilers or linters can be good at, i.e. avoid or alerting you when your code goes to a problematically larger/slower type or BigInt).

A.
There are only 4 elementary integer operations to consider, and Julia (can) takes care of those, with them simply redefined as explained. They are *, you consider simple, + and - that are equivalent and simpler adding only one bit, but then there’s the pesky ^ power operator.

Julia already special-cases some literal powers, e.g. x^2 and x^3 making them simple, basically repeated * which is already simple. For arbitrary power x^p I argue it returns a rational, mathematically, either v/1 or 1/v where v can be arbitrarily large. This case can be handled with floating-point that takes care of the overflow checking for you for free (but it’s not modular arithmetic, which isn’t clearly better and so you would need to opt into, but neither are the other operators modular any longer).

Division already returns Float64, and power could do the same being consistent with it (but I’m considering rather Float32). Just as with the current division, this is a compromise over rationals and e.g. DecFP.

Every larger piece of code is built up from these and Julia’s type inference takes care of the rest. Except:

At some point you need to put your value into some memory location (array) of potentially a different type than your inputs, and then the the runtime check could be done amortized over a long chain of calculations or if you actually want modular arithmetic then it can also be postponed until then (if you did promote to floating-point, it’s already an error to try that, but x^p could continue to work if p is an UInt, in case BigInt returned, as is appropriate in that case).

B.
Now for loops, if you do:

julia> sum = Int21(0)  # this could even start at typemax for Int21
julia> for i in 1:8_000_000_000_000
         sum += 1      # and the increment could even be typemax for Int21
       end

It looks like sum would overflow, but it wouldn’t (necessarily). Yes, I proposed Int21 so it could fit in a 32-bit location (and register), but on modern CPUs the registers are 64-bit, and then you can loop for over a day on 5 GHz CPU and still it wouldn’t overflow the register. Then finally you could do the split nanosecond runtime check at the end of the loop iteration, much beating the (possibly 15%, at least worst case) overhead you would get with Base.checked_add on every increment. It’s just an example, you lose this advantage with larger types which you might go to. But the principle is the same, you could unroll the loops a few times and check every 16 loop iterations and it would be safe, for arbitrary loop counts, given types up to Int47. Everything larger could go to BigInt.