I (we) will support saturating arithmetic in `FixedPointNumbers.jl`

as an experimental feature until the API design is mature. (If you’re interested in the background and API design, you can also see:

)

Since the addition and subtraction of fixed-point numbers is essentially the “same” as the addition and subtraction of integers, we will discuss just integer arithmetic below.

Saturating addition and subtraction can be implemented simply as follows:

```
using Base.Checked
function saturating_add1(x::T, y::T) where {T <: Integer}
r, f = add_with_overflow(x, y)
f ? (y < zero(y) ? typemin(x) : typemax(x)) : r
end
function saturating_sub1(x::T, y::T) where {T <: Integer}
r, f = sub_with_overflow(x, y)
f ? (y < zero(y) ? typemax(x) : typemin(x)) : r
end
```

```
julia> saturating_add1(typemax(UInt8), oneunit(UInt8))
0xff
julia> saturating_add1(typemin(Int8), -oneunit(Int8))
-128
julia> saturating_sub1(typemin(Int8), oneunit(Int8))
-128
julia> saturating_sub1(zero(Int8), typemin(Int8))
127
```

However they are incredibly slow.

```
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
julia> xu8 = rand(UInt8, 1000, 1000); yu8 = rand(UInt8, 1000, 1000);
julia> xi8 = rand(Int8, 1000, 1000); yi8 = rand(Int8, 1000, 1000);
julia> @btime $xu8 .+ $yu8;
85.300 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_add1.($xu8, $yu8);
434.400 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_add1.($xi8, $yi8);
679.600 μs (2 allocations: 976.70 KiB)
julia> @btime $xi8 .- $yi8;
85.600 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_sub1.($xu8, $yu8);
423.900 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_sub1.($xi8, $yi8);
680.200 μs (2 allocations: 976.70 KiB)
```

For that reason, the implementation in FixedPointNumbers.jl has been changed as follows.

(cf. https://github.com/JuliaMath/FixedPointNumbers.jl/pull/190)

```
function saturating_add2(x::T, y::T) where {T <: Unsigned}
x + min(~x, y)
end
function saturating_add2(x::T, y::T) where {T <: Signed}
x + ifelse(x < zero(x), max(y, typemin(x) - x), min(y, typemax(x) - x))
end
function saturating_sub2(x::T, y::T) where {T <: Unsigned}
x - min(x, y)
end
function saturating_sub2(x::T, y::T) where {T <: Signed}
x - ifelse(x < zero(x), min(y, x - typemin(x)), max(y, x - typemax(x)))
end
```

However, the codes for `Signed`

are still slow because it does not use the hardware saturating instructions.

```
julia> @btime saturating_add2.($xu8, $yu8);
95.500 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_add2.($xi8, $yi8);
185.001 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_sub2.($xu8, $yu8);
109.000 μs (2 allocations: 976.70 KiB)
julia> @btime saturating_sub2.($xi8, $yi8);
154.700 μs (2 allocations: 976.70 KiB)
```

Could we improve this without using low-level features like direct access to LLVM?

I think it might be worth supporting LLVM’s saturation arithmetic intrinsics in Julia v1.6.

BTW, what has happened on nightly over the past month?

```
julia> versioninfo()
Julia Version 1.6.0-DEV.1274
Commit 444aa87348 (2020-10-17 22:11 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.0 (ORCJIT, skylake)
julia> @btime $xu8 .+ $yu8;
795.400 μs (2 allocations: 976.70 KiB)
```