Posit Standard (2022): Floats/IEEE vs Posits, and ternary math

I think we can make posits faster than regular floating-point, in software, on regular CPU, like x86 and ARM (for the vast majority of operations, most that matter?).

The trick would be to rely on Julia’s type system to help, by relying on from (code comment) “Posit8, 16 are subsets of Float32” and “Posit32 is a subset of Float64”.

Right now:

julia> Posit8(1)+Posit8(1)
Posit8(2.0)

but if we defer rounding and converting back to Posit8 we can just use Float32 for the calculations:

julia> Posit8(1)+Posit8(1)
Posit8_as_Float32(2.0)

The quire at 128 bits for Posit8 is irrelevant (for at least that operation), because most of the defined arithmetic isn’t defined to return (or use the quire as inputs, or it seems as intermediate) such a result.

When you load from memory you have to decode the Posit[8], yes, loading as from memory is slow anyway, and the 2x-4x more bandwidth requirement of regular floats should make up for it.

Then you’re just doing the same float calculations there-after, hopefully a few before you need to store to memory, and then you need to round. I guess you do not round as often then, so will not get bit-identical results, not standard-conforming, but I think the software solution should be more accurate with less rounding.

A similar trick could actually also be used instead of (where/since it was thought of a storage-format, while that is changing and CPUs starting to support):

julia> Float16(2)+Float16(2)
Float16(4.0)

An potential option for (these new) posits, and well floats too, is:

It’s currently slow, because you’re forced to do it each time.

In general, if you multiply say 8-bit by 8-bit you get 16-bit, doubling number of bit guarantees you don’t need any rounding (unless you do repeatedly, and FPUs rounds anyway).

If you however add (or sub) two 8-bits you need only 1 extra bit, assuming you’re working on integers (or fixed-point), not floats. For floats you have possible catastrophic cancellation (with sub), but I don’t see it getting any worse with my idea. The quire is, it seems, to mitigate that, but you need to opt into using it (and it’s not longer supported in the pure-Julia SoftPosit.jl).

If you want the quire, there are only the 10 operations to consider (though likely also used indirectly in “5.5 Elementary functions of one posit value argument” too, or avoidable?):

5.11 Functions involving quire value arguments

to support. And it seems, it needs not be too slow to support. And only a problem if you use it often; e.g. in a loop.

The only function there seems not bad, as will fit in a CPU float:

5.7 Functions of three posit value arguments
fMM(posit1, posit2, posit3) returns posit1 × posit2 × posit3, rounded.