[RFC] What should the arithmetic within the FixedPointNumbers be

My comment was not about performance — fixed-point arithmetic is reasonably fast even without specialized hardware support, as @tim.holy’s comment above demonstrated. The basic issue I have with fixed-point arithmetic is that (by construction) it requires careful attention to scaling and is extremely susceptible to catastrophic errors; in contrast, the roundoff model of floating-point arithmetic is much more straightforward (though of course there are corner cases where one still needs to be careful).

Even just averaging an array of numbers is tricky to do with reasonable accuracy in fixed point, whereas with floating-point arithmetic very simple algorithms have good properties for well-conditioned sums.

Fixed-point arithmetic made sense on old embedded CPUs that lacked floating-point hardware; it seems lot less useful to me now (on general-purpose CPUs) except as a storage format for carefully scaled values (e.g. RGB data).

That being said, I’m not much of a fixed-point user myself, so I agree with @tim.holy that the design of FixedPointNumbers should be tailored by applications that actually need it.


Option 2 seems the most plausible solution to me in the near future because of the consistency. The wrapping_*, saturating_* and checked_* operations provide very fine grained control on how FixedPoint can be calculated, which is great.

My main concern is to have FixedPointNumbers more transparent to users, and even to developers. Can we gain performance boost by tweaking between different behaviors and types? Yes. Should we care about them all the time? I don’t think so, that’s why in many of the cases in JuliaImages, we’re using float.(img) to avoid the overflow issue by, of course, conceding the extra and unnecessary memory allocation.

In my own use case, if there isn’t a better replacement to float.(img), I probably would stick to the float point world and leave FixedPoint as a pure storage type. Direct calculation on it without promoting to float types would be considered suspicious in my own experience.

Option 5 gives a more promising solution to retire float.(img). That’s why I like it.

While extra care is desired to avoid catastrophic precision loss, they do let you very efficiently have no precision loss for addition/subtraction, which is a really nice property in some contexts.

This is no different from floating-point arithmetic. If two values are scaled the same way and don’t saturate the precision, there is no error for floating-point addition/subtraction. 112342342.0 - 62345325.0 is computed exactly in Float64 arithmetic, for example.

The only thing that floating-point loses compared to fixed-point is that, for the same word size, it sacrifices a few bits of precision (from the significand) to put into the exponent. So, for example, Float32 arithmetic has a few less digits of precision than 32-bit fixed-point, but in return it has vastly greater dynamic range.


Perhaps this is a tautology. Accumulation of floating point numbers can easily saturate the precision (unless you are targeting only Fixed and not Normed).

There is a trade-off between floating-point and fixed-point, and I don’t think it’s possible to say which is more appropriate for a computational type.

I think your argument is reasonable.

However, I prefer to provide multiple means of satisfying individual demands rather than introducing a comprehensive method of satisfying the demand at large.

There is another application for FixedPointNumbers besides image processing: signal processing.
Analog to Digital Converters (ADCs) usually provide fixed point numbers instead of floating point numbers.
I tried to use FixedPointNumbers about a year ago, but I dropped it because of performance reasons. Instead I developed my own fixed point arithmetic here: https://github.com/JuliaGNSS/GNSSSignals.jl/blob/master/src/carrier.jl#L123
However, I would prefer to use FixedPointNumbers and I am happy to hear that the performance has improved since then.
I think that for fixed point numbers the possibility of an overflow is even more obvious than for the plain integer type. Since Julia uses the unchecked integer arithmetic, I would suggest that FixedPointNumbers does the same. If anyone wants to have checked fixed point arithmetic, he or she could use @checked.


I’m not strongly opposed to changing the default to floating-point arithmetic in the future, but that should only be done after tools like @checked are in place.

I’m tired of comparing the advantages and disadvantages of each proposal. :sweat_smile:
More effort should be put into identifying the cases where the change would be detrimental and considering ways to resolve the specific problems.

In terms of image processing, I have two experiments underway. One is to stress-test the JuliaImages packages with new storage strange color types. The other is to systematize a workflow for fixing bugs using Cassette.jl.

1 Like

No more easily than fixed point arithmetic with the same number of significand bits, and the consequences are less severe.

If you add a few fixed point numbers for which every digit is nonzero (i.e. the precision is saturated), then the result is quite likely to be catastrophic overflow. If you add a few floating-point numbers with the same scaling (representable with the same exponent) for which every significand digit is nonzero, then you get a small roundoff error.

To get error-free addition of several fixed-point numbers, you generally need some significand digits to be zero so that you have room to accumulate. Under the same circumstances for equally-scaled floating-point numbers, the addition will also be exact.

(Subtraction of two equally-scaled floating-point numbers is also exact.)


You seem to be making a general comparison between integer and floating-point. Perhaps the key type here is Normed. I have done an experiment with a type called NormedFloat, but it is conceptual and has no practical use.

Again, I don’t want to argue any further about whether a fixed-point or a floating-point is better.

I’m sorry I couldn’t proceed with the discussion well.

I say I don’t want to compare the pros and cons here. The reason is never that it’s useless. To show the advantages of one proposal and to show the disadvantages of another proposal are essentially different things.

FixedPointNumbers is a low-level package, and there are several intermediate layers before its functionality reaches the user. Also, there are potential variations in how to improve the user experience. It is scientific etiquette to make comparisons under the same conditions. However it implicitly pins the comparison target to a particular layer.

It’s correct in some contexts that N0f8 isn’t a computational type because it’s prone to overflow, but that is not always correct. We may be able to change the inputs or formulas so that they do not overflow in some contexts. In another context, we may be able to take action after the overflow (e.g. 2-pass methods). Those measures may be taken outside of julia (e.g. IDEs, documentation or workshops).

We need to consider many things, but we also have the potential for improvement. On the “top” of that, I want to think about what the “bottom” FixedPointNumbers can do.

1 Like

Some companies (including one I recently worked at) use Matlab or C to model the fixed-point arithmetic in FPGAs and ASICs when generating test vectors. E.g. helping to design and debug hardware such as the LTE modem in your phone or in a cell tower. Though some try for bit-exact modeling, others go only for bit-close. An example of a format I believe FixedPointNumbers.jl does not support is an 18-bit number with a sign bit and 17 fractional bits. My (possibly wrong) impression is that the register size in FixedPointNumbers.jl is at present constrained to be one of Julia’s normal integer values, so an 18-bit fixed-point number is not (directly) allowed. There’s more that could be said about this, but perhaps this gives you the idea.

Once one has a low-level software-based model of the fixed-point hardware, and you’ve debugged the hardware to some adequate degree, this model can be used to replace a floating-point model of what the hardware is supposed to be doing with the fixed-point model and check the high-level performance. If Julia were to be used in this last performance-modeling scenario, it would be helpful to have fast arithmetic.

I am not working on this at present, so please don’t add this for me, yet perhaps this “user story” will be helpful for someone, even if it’s a bit of a tangent from the rest of the thread.


I think that people consider N0f8 & friends problematic for computation because even though one may eliminate or alleviate some numerical problems by careful analysis and manual rearrangement, this is not something that scales well for any nontrivial calculation.

Personally I don’t ever use fixed point representations, so my opinion should not matter for this package. But, FWIW, I think that falling back to floats may be a reasonable default, with the understanding that you are not really supposed to do arithmetic with these types, but if you happen to do it, you will still get sensible results.

If, for some reason you want to go back to fixed point types, that should be an explicit step you design for in the code, ideally with some thought given to numerical error.


This is not yet conclusive, but considering the introduction of Option 5 or its variants (in the future), I think the checked_*/wrapping_*/saturating_* methods should be packed into a separate package or at least a sub-module (like FixedPointArithmetic).

The checked_* is defined in Base.Checked, so it doesn’t matter where the implementation is located. However, everything else is currently defined in FixedPointNumbers, so we need to have a more stable place to define the APIs.

With the help of @tim.holy, CheckedArithmeticCore is currently awaiting registration. I will opene an issue to add the definitions to CheckedArithmeticCore.

What this means, as the OP says, is that v0.9.0 will take a bit longer to release, and that @wrapping and @saturating might be added as the variations of @checked in the near future.

Edit: xref: https://github.com/JuliaMath/CheckedArithmetic.jl/issues/9

1 Like

Thanks! FPN does implement

julia> Q14f17
Q14f17 (alias for Fixed{Int32, 17})

which kind of gets at that type you were asking about. However, it won’t overflow in the same way (it has 14 “integer” bits in addition to the 17 fractional bits), and perhaps that’s the distinction you’re making. One could mask out the integer bits after each operation, and you could probably automate that by creating a type-wrapper around it. But that isn’t supported out-of-the-box.


I think you might have hit on the right answer: what about putting the arithmetic in a separate package? What if we had

  • FixedPointWrappingArthmetic
  • FixedPointSaturatingArithmetic
  • FixedPointCheckedArithmetic
  • FixedPointFloatArithmetic

and the user/developer just picked one? The main issue I foresee is we now need some mechanism of package exclusion: since they would all override Base operators, we can’t have ImageCore picking FixedPointFloatArithmetic while some other simultaneously-loaded package picks FixedPointCheckedArithmetic.

In the modern world all these could live in the same repository, so it would ensure they all remain closely coupled.

Splitting out the arithmetic and defining in a secondary package is technically type-piracy, but it’s that closely-coupled form of piracy that I’ve long argued has a valid place in the ecosystem.

The lack of composability seems problematic here. Would it be so bad to have a type parameter indicating the desired kind of arithmetic?


That’s probably better. More complex and might break signatures in downstream packages, but we’re expecting to need to do a pretty big upgrade anyway when we finally decide how to handle this.

I think it might be a good idea to sort out what could cause a conflict before discussing specific ways for isolation.

1. function name

I think it’s reasonable to assign different function names to different arithmetic and different operations.
That is, checked_add, saturating_add and saturating_sub should all be unique function names.
(The names themselves are debatable, but the uniqueness is the key here.)

Julia allows you to access functions in a variety of ways. For example, we can define wrapper functions such as add(x, y, Val(:checked)) and checked_op(Val(:add), x, y ). In fact, the argument in the OP is not “how” to implement an operator, but which function to assign to the operator.

Also, we have a powerful meta-programming feature.

The reason I value the uniqueness of the function names is that if we decide to introduce a more abstract API, it will interrupt my current plans on CheckedArithmetic or cause a significant rework.

2. operator symbol

Syntactically, operator symbols and function names have almost the same meaning, but semantically they are quite different. (Edit: E.g. the difference between + and - is the difference in function names, not the difference between addition and subtraction.) In other words, there are only a limited number of symbols that make sense for a given function. Also, the operators we often use are unary or binary, and if we try to use more than three arguments (i.e. operands), they lose their syntactic charm. The conflict and ambiguity of operators occur in many places, not just in Julia.

Therefore, if you want the same operator symbols to have different functions, we need to differentiate by the (static) types of operands (i.e. multiple dispatch), or provide new types which contain “dynamic” metadata. The latter may seem impractical from a performance standpoint, but I think it’s a feasible approach for “containers” like images.

Another approach is to introduce a DSL for numerical expressions. @checked is a kind of this approach, in the sense that it replaces the Julia-native operators.

Another solution, which is distinct from the above, is to not use the same symbols. This idea is (jokingly) brought to us by @tim.holy. I don’t think the new operator symbols will be acceptable to all people, but I think it’s worth considering for use in conjunction with another measure, as syntax sugar.

3. type name

If we look at the relationship between the operator and the operands in the opposite direction, we can consider that the operand types are in conflict.

4. promotion rule

Currently, FixedPointNumbers “directly” overload the operators (e.g. +(::X, ::X) where X <: FixedPoint), but we can also force a promotion phase by removing the direct overloading. In that case, the promotion rules can conflict. Differences in promotion rules do not manifest themselves as differences in user codes lexically. Although this is an advantage in terms of abstraction, it can cause major confusion.

I think these concepts are not necessarily orthogonal and probably not exhaustive.

You mean the other way round, right?

In any case, another solution could be sticking to the names checked_add, saturating_sub etc, and have a macro

@arithmetic_mode checked begin

that walks the entire expression, replacing + with checked_add etc. Maybe @checked could be a shortcut for @arithmetic_mode checked.

This would preserve syntactic clarity.