The stackoverflow post you linked is pretty good already.
Because thereās no hardware support and no standard for it.
Is it? What problems do you have in mind? What real number problems never involve negation or substraction (see below)?
There are many properties that unsigned integer format has but are not possible for a āunsigned floating point formatā.
The unsigned int and signed int has the same representation. Not possible with existing signed floating point type since itās not stored as 2s complement. In fact, signed int is a special interpretation of unsigned int, not the other way around. Iām not aware of a way to make a signed floating point format that is based on an unsigned version with easy to understand semantics.
Unsigned int has well defined wrapping behavior (related to 1). Not possible for floating point format. (Not any simple ways that I can come up with at least)
Negation and substraction are well defined for/between any unsigned int(s). Direct consequence of 2. Not possible with floating point.
Unsigned int is used to represent addresses and sizes, so computers natually need them. There isnāt a need like this for floating point.
Also note that the OP in the stkovr question isnāt really asking for a different format, he is really asking for range check, which is a somewhat useful feature to have in languages and is possible in julia with custom types. (and there was a very recent thread about it here)
That used to be the case, and while I think no standard (yet) I believe thereās Chinese hardware with UE8M0 already (and supported by Nvidia in PTX, not sure in actual hardware though, as a type, only intermediate coed?):
DeepSeekās UE8M0 FP8 represents a breakthrough in efficient AI training, enabling massive language models to run on alternative hardware while maintaining competitive performance. This specialized 8-bit floating-point format trades precision for unprecedented efficiency, allowing models like DeepSeek-V3.1 to be trained without relying entirely on expensive Nvidia hardware.
So the concept is there, just a question how Julia should support or adapt⦠itās a trait now? Real and UReal abstract type?
The type for AI, is I think never used on its own, but could beā¦
Hmph, isnāt UE8M0 just rebranding integer arithmetic? As in, couldnāt they take the same Int8 circuits from a MOS 6502, and call them the new floating point?
No theyāre not trolling, itās literally supported by NVIDIA and used by DeepSeek! Ironic because I could have been a BASIC programmer in the 70s, when you had to emulate floating point by hand. (Also embedded people had to do it until fairly recently.) After being laid off since the 80s, I could be rehired as the expert in this new-fangled AI technique!
More seriously, I guess hardware now does fast math with almost arbitrary precision, while complicating the software. Except hopefully you still use high-level PyTorch and it gets compiled into appropriate kernels with UE8M0 that nobody has to read. Itās just weird because floating point units were such an advance because they handled everything.
I thought I couldnāt keep up with modernity, but maybe I can. Neural networks were antithetical to AI, now they define AI. Logistic regression was hopelessly out of date, now itās deep learning (especially if you use SGD). Integer is the new floating point, slide rules are the new calculators. I just wish I had saved my flared jeans from the 70s.
No, not exactly and note (Signed and) Unsigned Julia abstract type do assume integers, and all integers in a range (and 2ās complement for Signed), so canāt be used here but UE8M0 is unsigned yes (i.e. reprecenting half the number line, or rather up to its very limited typemax), but are still floating point reals, e.g. 1, 2, 4, 8 etc, though none in-between since yes, thereās neither any mantissa, also unusual. I assume but havenāt confirmed that you can represent 0, and having floats without 0 is interesting, so I would also like such a type. It most likely has an Inf, maybe only one bitpattern for it sharing with NaN and -Inf.
Itās not too hard to emulate, with integers yes (even easier than old-style 8-bit micros/Microsoft floats), since it seems to be close to log2 logarithmic numbers, if not exactly the same.
This is too specific to fit into Juliaās generic scientific programming setup (the precision-speed trade-offs are quite extreme), and in any case we are talking about shifting a calculation into logarithms, a very old trick. Even in the ML context, there is probably no point in training with this type and then calculating with another, more precise one, as the results might be just way off.
If someone really needs this, it should be put into a package, defined using a primitive type, which can be hooked into LLVM if/when the support arrives.
They may represent reals but theyāre just bits manipulated like integers, just as with all floating point exponents. I was just amused because this is way less than true floating point. In fixed point, addition is easy, and thereās little difference between fixed point and integer. An FPU is much more complex, taking way more surface area than an ALU. Multiplication is just a fixed point multiply of mantissa and integer addition of exponent, sounds easy? No because thereās a ton of circuitry to track and move the radix point. Iām mostly joking, but this new āfloating pointā UE8M0 is less than an integer arithmetic unit, since all you need is integer addition to do āFP multiplicationā, and the mantissa is somebody elseās problem.
My point is the Int8 0x2 + 0x2 = 0x4 is all you need to multiply two UE8M0. AFAIK addition of UE8M0s isnāt implemented in hardware. Itās kind of like how a slide rule is great for multiplication but terrible for addition. Happy to be corrected if Iām misinterpreting; I merely scanned the docs.
I donāt mean Julia should support this as with Float16, I mean does Julia need an UReal abstract type only (to enable the rest in a package, and better to have the abstract type in Base, as most others)? Thatās a really cheap addition. Or does standard Real (abstract type) not really care if negative numbers can be represented or not? Or if zero is absent (which I donāt see clearly confirmed 0x00 could be zero, or in some design). Or even Infs, this has only (one) NaN.
Not really, depends on the size of the floats, and the complexity scales when with smaller. add/subtract is simple with an ALU, and since this is basically logarithmic, then you use that, and some exceptional check.
Multiplying Float64 is I believe already simpler than for UIn64, since the mantissa is smaller than 64-bits. And Float32 has better range and performance than Int64. Multiplying used to not be single-cycle since itās inherently more complex so they throw transistors (basically lookup tables at the problem, not sure if even only one) to get single-cycle. For this addition gets more complex, but can be done with a 256-byte lookup table if I recall, at least with 256*256 lookup table for all binary ops, and 256-bite for e.g. square root.
Yes there is, itās in the article. I believe though never only this one used only (this used for a shared exponent scale I believe, with I guess many regular FP8, FP6 or even FP4, not sure what they use), and not other for inference, but same.