Hello,

there is some activity to implement different number formats other than `Rational`

and `float`

. All approaches known to me so far try to tackle a fundamental tradeoff:

We want our calculations to be **fast** and **accurate**. However, on real computing machines, we cannot have both.

Every digital design textbook treats number representation as Int and IEEE 754 floats, but most of them dont mind (software) approaches to more accurate (=rational) alternatives.

The purpose of this topic is to prepare a part of documentation or report, which

gives users / programmers / scientists advice for selecting the best number representation / package for their julia project and potentially steer future efforts in implementing number formats in julia.

My idea is to collect existing packages here, discuss background information and categorize / generalize different existing approaches.

(Maybe some digital design basics are needed as background information, as computer engineer Iâ€™m surely biased.)

Contributions are very welcome!

In the following, *dynamic* is a measure of the greatest and smallest values the number types can represent. If we only have values close together in our calculations, we need low dynamic. If we deal with values which differ by orders of magnitude, we need higher dynamic.

We are used to do arithmeric over the real numbers. But with a finite number of bits we can only

represent rational numbers exactly. Assume that we can represent all reals of interest with big or small enough numerators and denominators.

Note that with finite number of bits we loose associativity and commutativity, so the compiler is in general not allowed to reformulate our calculation order.

## Fixed point numbers

(fast side of the medal)

With a fixed â€śdecimalâ€ť point or fixed denominator, the underlying representation (bits) can be simplified to just integers. Artihmetic for integers is fast.

We can only add two fixed point numbers, if they have the same denominator or fractional part. The julias typesystem can be used to ensure we dont mix different fractional part up.

Multiplication is the same as for integers, often we want a rescaling afterwards.

When we choose the denominator or fractional part as a power of 2, the expensive rescaling becomes a trivial bit shift operation.

The dynamic is fixed, so calculations may easily produce values which will not fit the to the input representation. The so introduced errors (over-/underflow and truncation / roundoff) need extra attention.

Often, but not always, you have some a priori knowledge of the domain / inputs, which helps in the selection of needed number of bits and selection of introduced error behavior.

Trading performance for accuracy, one may also dynamically check if an error occured and take some counter measures.

- with power of 2 denominator:
`FixedPointNumbers.jl`

## Rationals

(more accurate side of the medal)

When we allow the denominator to also change, we can greatly increase the dynamic. So we can represent a broader range of values. Without restrictions on the values of numerator or denominator, the represented value V=n/d is exact (as with `Base.Rational{BigInt}`

)

Artihmetic now needs some extra steps you may recognise from elementary school: E.g. if we add two numbers: First we need to calculate a common denominator, than rescale the numerators, add them and rescale / reduce the result. The timely part here is the rescaling / reduction step: For the reduced representation one needs to calculate the greatest common denominator (GCD) of numerator and denominator (no hope for significant speedups here).

Some packages try to avoid the GCD part as long as they can or avoid it completely (allowing silent errors).

- always do GCD:
`Base.Rational`

- always skip GCD:
`SimpleRatio`

from`Ratios.jl`

- ~GCD only when needed:
`FastRationals.jl`

## Floating point numbers

For floating points we restrict the denominator to powers of two. This allows as to largely increase the dynamic, as we can use an integer e to represent 2^e for the denominator.

The extra steps to obtain common denominators are still needed. But the restriction to powers of two denominators, these procedures can be implemented nicer in digital hardware than for the general case.

The IEEE 754 standart was introduced as a common, machine-indepentent representation. This enables (hardware) implementations of the mentioned extra steps, so users dont need to think about any floating point representation details by default.

On all (?) desktop computers today we have supporting hardware which allows reasonably fast artihmetic with IEEE 754 floats. However, an floating point operation may take say 2xâ€¦10x longer than an integer / fixed point operation.

## Logarithmic numbers

Numbers can also be represented by the logarithm of their magnitude. Such a representation has really high dynamic from a fixed number of bits. They are used in domains where relevant values lie several orders of magnitudes apart, e.g. in some signal processing, probability or machine learing.

Operations like multiplication, division, roots or powers are relatively cheap.

But the steps needed to perform addition and substraction are much more expensive than for floating points.