No overflow nor underflow rounding mode for Floats

Hey there,
I see from the manual that different rounding modes for floats exist. However, I was wondering whether there is a smart way to set or emulate a rounding mode that is similar to RoundNearest, but without overflow nor underflow. The smallest representable number is never round to zero and the largest never to infinity. At the moment we have

julia> a = prevfloat(typemax(Float16))

julia> a+a

causing an overflow and

julia> b = nextfloat(zero(Float16))

julia> b/2

causing underflow. Similarly for Float32, Float64 and negative numbers. So in the first example I would like a+a to yield a and in the second b/2 to yield b. This is motivated as 2a is closer to a than to infinity … Do you know whether there is any way to set this behaviour?

Anybody any thoughts on that?

You could create your own type which wraps a float and then implement the basic operations yourself, using whatever rounding scheme you like. Something like:

struct MyFloat{F <: AbstractFloat} <: Number

Base.:+(f1::MyFloat, f2::MyFloat) = MyFloat(your_particular_rounding_scheme(f1.value + f2.value))

for interaction between your new type and Julia’s existing numbers, check out the conversion and promotion section of the docs:

1 Like

Thanks for pointing me in this direction, I’ll give this a try!

I also vaguely remember there being a helpful package designed to make it a bit easier to create custom Number types, but i can’t remember what it’s called (nor can google find it). Perhaps someone else here will remember…

You can try FiniteFloats.jl. Infinities are avoided, as are many NaNs. I do not avoid zeros, so prevfloat(nextfloat(0.0)) will be 0.0. Post an issue if any arises.
Please let me know if you find it is useful to you.

That looks amazing! I’ll go and see whether this fits my purpose and then come back to you. Thanks!