RFC: Compressed (base-6) floating point, taking 1/8 the space (new plan 1/64 compression for neural networks)

Palli · September 15, 2022, 11:41am

This will be a storage format only as described here, but I have other ideas how to work with posits, implicitly, as Float64 until really needing to store, to make faster than Float32.

I made a new unsigned posit8, because 3 significant binary digits is awfully small, and max 4 fractional bits then at least a bit better.

This is work-in-progress, and as is below dangerous to use (with min or I think also average values stored), as I explain:

Note, unchanged Posit8 has all -16 to +16 integers exact, is are a superset of those numbers, must be where I got 2 significant digital digits from. That’s true, but not for arbitrary XX integer, e.g. 21 or 99.

With that one more bit I suppose I get 0 to 32 exact, then base-30 is more practical, since I want to have 1.0 exact… but it’s only exact up to 0.9, 1.0 and then 1 1/3 (and I thought I would get 1 2/3 exact but seemingly not) but then no more.

Then I can no longer store the average, so I store the min number (likely must store the max number too). I should be able to get some more extra bits for the remaining 6 or 5 numbers. That’s an unfinished idea.

If I store min and max, I can store then in [min, max] or [max, min] order giving me one more bit to work with and use somewhere and can calculate the average from that. I can scale the difference numbers to get some more fractional bits.

As is:

julia> UPosit8(n) = UInt8((reinterpret(UInt16, Posit16(n)) << 1) >> 8)

julia> UPosit8_to_Float64(n) = Float64(reinterpret(Posit16, UInt16(n) << 7)) # UInt8((reinterpret(UInt16, Posit16(n)) << 1) >> 8)

julia> function encode_8p(tuple; base = 30)
         (a, b, c, d, e, f, g, h) = tuple .* base//2
         m = min(a, b, c, d, e, f, g, h); # mid = (a + b + c + d + e + f + g + h) / 8
         (Posit8(m), UPosit8.((b - m, c - m, d - m, e - m, f - m, g - m, h - m)))
       end

julia> function decode_8p((min, (b, c, d, e, f, g, h)); base=30) # , T=Posit16)
         b, c, d, e, f, g, h = UPosit8_to_Float64.((b, c, d, e, f, g, h))
         min = Float64(min)
         (min, b + c + d + e + f + g + h, b + min, c + min, d + min, e + min, f + min, g + min, h + min) ./ base//2
       end

julia> function decode_8p((min, b, c, d, e, f, g, h); base=30) # , T=Posit16)
         (a, b, c, d, e, f, g, h) = Float64.((b + c + d + e + f + g + h, b + min, c + min, d + min, e + min, f + min, g + min, h + min)) ./ base//2
       end

Alternative, that should be safe:
julia> function encode_8p(tuple; base=30, T=Posit16)
         (a, b, c, d, e, f, g, h) = tuple .* base//2;  # mid = (a + b + c + d + e + f + g + h) / 8
         T.((a, b, c, d, e, f, g, h))
       end

Note, the first two values are just debug values, I’ve not yet recovered the first value, but it should be possible:

julia> decode_8p(encode_8p((-0.0, 1.3, 3.0, 4.0, 12.0, 3.0, 5.0, 0.3); base=30); base=30)
(0.0, 27.9, 1.2, 2.933333333333333, 4.0, 11.733333333333333, 2.933333333333333, 4.8, 0.3)

As you can see 0.3 exact as promised, and 1 + 1//3 (both unlike in regular floats or posits) but if I change/ lower the min value, then values change:

julia> decode_8p(encode_8p((-5.0, 1.3, 1 + 1//3, 12.0, 1 + 2//3, 0.9, 5.0, 0.3); base=30); base=30)
(-5.333333333333333, 54.4, 0.5333333333333333, 0.5333333333333333, 10.666666666666666, 1.0666666666666667, 0.5333333333333333, 4.266666666666667, -0.5333333333333333)

e.g. 0.3 changes to -0.533 which is kind of bad… I think that’s because of the nature of the posit system (and low precision, and catastrophic cancellation), so working with (low-bit) posits has been illuminating, but I think I can work around this and make posits user-friendly.

No, I don’t worry about signed-zero personally, and that’s the least of my worries now. I was just informing about posits in general, not just my system based on them, asking if others really want them. Then it would be a deal-breaker and I might as well abandon my investigation.

Topic		Replies	Views
Would you accept a compressed number format to replace Float64 and integers if you had to opt into a package? General Usage	0	259	October 2, 2022
Why isn't Float32 == Float64 (Converting from Float32 to Float64) General Usage question	5	663	July 2, 2020
ANN: BitIntegers.jl (Int256, ...) and BitFloats.jl (Float80, Float128) Community package , announcement	8	1705	January 24, 2022
Massive performance penalty for Float16 compared to Float32 Performance performance	17	8050	June 20, 2022
Quick probably basic question on Floats General Usage question , float	5	856	August 2, 2020

RFC: Compressed (base-6) floating point, taking 1/8 the space (new plan 1/64 compression for neural networks)

Related topics