# Understanding order of operations and overflow problems

Say I have the following 4 arrays

``````p=Float16[262.0, 289.5, 4.71]
tb=Float16[9.42, -3.555, -7.098]
ts=Float16[357.5, 306.5, 294.8]
nc=UInt16[0xf43a, 0xfff7, 0x312f]
``````

and I then want to compute
`floor.(UInt16, (p.-tb)./ts.*nc )`

Resulting in

ERROR: InexactError: trunc(UInt16, Inf)
Stacktrace:
 trunc
@ .\float.jl:695 [inlined]
 floor
@ .\float.jl:294 [inlined]
 getindex
 macro expansion
 macro expansion
@ .\simdloop.jl:77 [inlined]
 copyto!
 copyto!
 copy
 top-level scope
@ REPL:1

Which fair enough i probably should have added another parenthesis knowing that the result should always be bounded to lie in the range [0,1] which should be valid

`floor.(UInt16, ((p.-tb)./ts).*nc )`

I get the same error. But if we break it down then we know that

julia> (p.-tb)./ts
3-element Vector{Float16}:
0.7065
0.956
0.04007

and

julia> 0.956*nc
62643.812

so

julia> floor(UInt16,0.956*nc)
0xf4b3

Why then will the following not work
`floor.(UInt16, ((p.-tb)./ts).*nc )`

I assume (based on the stack trace) this has something to do with broadcasting, order of operations, and a fundamental lack of understanding regarding what is going on under the hood when I multiple the Float16 and UInt16 together.

I encountered this as part of a larger timing test in which I was using Distributions.jl to produce random numbers uniformly on the full range of UInt16. I do not actually expect nc to be near type max often (or at all) but now want to understand why I cant multiply the two things together without producing an Inf.

Is there a way to re-arrange the operations or otherwise perform the same math in a type safe manner?

I want to add that I produce the Inf16 regardless of which side of the multiplication I place the UInt16 on

julia> nc*((p-tb)/ts)
Inf16

julia> ((p-tb)/ts)*nc
Inf16

julia> ((p-tb)/ts)
Float16(0.956)

julia> nc
0xfff7

julia> Float16(0.956)*0xfff7
Inf16

julia> 0xfff7*Float16(0.956)
Inf16

So it has nothing to do with broadcasting or order of operations and clearly has to do with my lack of understanding concerning the basic operation of multiplication between a UInt16 and Float16. Apologies if this is obvious to everyone else.

Interestingly this did not happen at all when doing the same thing between UInt32 and Float32 and UInt64 and Float64 so I am guessing this has to do with overflowing the Float16 specifically

julia> 0xffffffff*Float32(0.999)
4.2906724f9

julia> 0xffffffff_ffffffff*Float64(0.999)
1.8428297329635842e19

Digging further and it is apparently I dont understand what the valid range of Float16 is…

julia> Float16(65_510)
Float16(6.55e4)

julia> Float16(65_520)
Inf16

I am guessing this maybe has to do with supporting the Inf value? Even though I expected to be able to get to +/- 65,504 (From Wikipedia)

Almost all modern uses follow the IEEE 754-2008 standard, where the 16-bit base-2 format is referred to as binary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024.

Having said that the max of UInt16 still would have overflowed

julia> Int(typemax(UInt16))
65535

What are the true maximums supported by the various Floats prior to hitting Inf?

I suppose if I just scrolled down further on the wikipedia entry I would have found that

65519 is the largest number that will round to a finite number (65504), 65520 and larger will round to infinity. This is for round-to-even, other rounding strategies will change this cutoff.

Carry on people nothing to see here…

1 Like