We can implement FastIntegers.jl, faster than the current default Int64
and eliminate overflows from the language by leveraging Julia’s type system, and no hardware support is needed. As a package it of course doesn’t change the language itself, it would be a prototype, or actually very useful as such, but even better if integrated into the language. That could happen for Julia 2.0 but not if people are against the idea.
If we consider even very simple functions:
double(x) = 2x
then it’s not safe, for integers, i.e. they can overflow (actually modular arithmetic used, that not all want or are familiar with). There was a recent discussion about this explaining the pros and cons of overflow checking, with it incurring maybe 15% slowdown (and worse for SIMD vectorized code) but there needs be no overhead, even for SIMD code. We can have our cake and eat it too, even potentially 2x faster or more.
[You can already get the bandwidth saving, 2x or 4x faster, by opting into smaller integer types, but people shy away from them, especially the smaller they get because then the overflow potential increases, but it will be eliminated for all types, so no need to. Then no need for fear, and currently all of Julia’s integer types can overflow except BigInt
(and BigFloat
).]
We need overflow checking to be fast for the most common operations such as addition and multiply, and the fastest way is by avoiding it (entirely).
It’s best to explain for (Unsigned) multiplication first then addition/subtraction.
If you multiply two numbers the number of bits/digits potentially doubles (at worst, i.e. in reality usually only a few bits added), e.g. 32-bit x 32-bit = 64-bit result (but currently Julia, or the CPU, drops the first 32 bits; hopefully, for unsigned numbers, they are all zeros). This uses the imull
x86 assembly instruction and for 64-bit multiply imulq
is used, which is a bit slower, but the saving would mainly come from half as much memory bandwidth used.
Redefining multiplication this way (and similarly for all the integer types types and operators), means there’s never any potential overflow to check for:
julia> import Base.*
julia> *(x::Int32, y::Int32) = Int64(x) * Int64(y)
But this means you must want to be using Int32
, and you might complain that in Julia Int
is 64-bit, but that’s not always they case, nor needed/wanted. On 32-bit systems (still) supported Int
is Int32
already so if you want correct portable code you must test for that possibility too. You might argue we need Int64
to address more memory on 64-bit systems, but then you’re also wrong.
Very recent CPUs have 57-bit addressing otherwise only 48-bit or lower available, so indexes to square arrays need only 23-27 bits for the theoretical maximum array sizes of Float64
s of 256 TB to 128 PB.
The upside of using 32-bit integers or smaller is huge, 2x higher memory bandwidth and free overflow checks. In many cases you can even go lower, to e.g. 16-bit for 4x memory bandwidth.
Now for addition, adding two numbers adds only 1 bit to the result, and they are the most common arithmetic operation. So a Int33 would be useful but a bit awkward as you can’t store it in the same space (also bad if you subsequently need to multiply). With that type or even if just using Int64
as a result type for Int32
addition, we also avoid overflows for addition and subtraction. Now if we have additional (non-power of 2) types, we can do better.
And we actually do have Int16
and Int8
but it seems a bit excessive to go that low for a default type, I think that would need to be opt-in.
So what is bad about this? We of course have less range, only 65536 values for UInt16, restrictive, but 2097152 values for [U]Int21
i.e. plus minus a million for signed integers. That seems like a useful potential default if not Int32
or Int31
.
I’m not suggesting new types need to bit-aligned in memory, that could be a later optimization e.g. three Int21
would fit in each 64-bit memory location (or register).
Let’s take an example:
polynominal(x) = x^3 + 2x^2 - 1
For Int21
input, we get Int21*Int21*Int21
=Int63 + 2*Int21*Int21
=Int43 - Int21 or Int63 + Int43 + Int21.
If you evaluate this (polynominals) from left-to-right, then you get Int64 then Int65, which is bad, but from right-to-left, you get Int44 first then Int64. This is something you might optimize in your program by rearranging, or the compiler could, because unlike for floating point, it’s allowed to.
Note, also 2x^2 needs not evaluate as Int21Int21Int21, since the literal 2 needs only add one bit (like addition, but literals 3 and 4 would need to add 2 bits. For the power ^
operator Julia does similar compile-time optimization, that seems could be applied similarly here, with: @inline literal_pow(f, x, ::Val{p}) where {p} = f(x,p)
Most calculations do not involve power (which I actually think should give you a float in general except for Unsigned or positive literal integers, that can also be handled in a good way) I’m just showing in many common cases they are neither a problem.
Most calculations are additions/subtractions and only add 1 bit at a time (or double the number of bits for multiply). If at any point you want to store back results to memory then you have a choice, that location needs to be large enough as the type you ended up with (which may actually not be larger as the expanded type can be a bit of an illusion), or you (or the compiler) needs to reduce the number of bits. The extra bits are most likely to be zeros anyway (or all ones for negative numbers, neither a problem). If at any point you did e.g. a divide converting to float (or any other operation doing that or to a rational or BigInt) then also you do not have a problem. E.g.
dist = sqrt(x^2 + y^2)
That function would fail for all of Julia’s integer types (except BigInt) when you overflow, but not with my scheme for any integer type (even if Int64 is used as inputs), because you always get a larger type before that happens, and for highest speed you want to make sure first your inputs are no larger than Int32.
Julia already has an Int128
type (and Int256 up to Int1024 available in package) but defining Int64
→ Int128
is going to be slower, why it’s important for the inputs to multiply being Int32
(or smaller) and add/subtract at Int63
or smaller. You can go higher, and it only gets gradually slower and then you can end with BigInt or better its much faster replacement ArbNumerics.jl (or potentially floats is wanted).