typeof(Int32(1)/Int32(2)) == Float64 @_@


Why isn’t it Float32?


Well, why would it be?


So I don’t have to track down every division of 2 low precision integers and surround them with a low precision Float wrapper. I think it is more natural to assume that if I input Int32, I don’t want Float64 outputs unless explicitly specified. That makes more sense to me, and I hope I am not alone.

Also BigInt(1)/BigInt(2) is BigFloat and Int64(1)/Int64(2) is Float64, so I was surprised that Int32(1)/Int32(2) is not Float32.


If you expect /(::IntX, ::IntX)::FloatX, what should happen with Int8?


Corner case, can be defined specially and documented.


Why would the size of the integers have anything to do with the precision of the resulting floats?


Well for me, I use Int32 and Float32 together to make calculations faster, so having a Float64 show up in the middle of calculations infects the rest of the numbers and I lose speed unless I hunt down every such occurrence manually which seems odd and ugly. Is there an equally important use case for small size integers and high precision resulting float? It’s also going to be consistent with the BigFloat and Float64 cases.


This drove me crazy initially as well. But, after some time to get used to it, I consider julia’s arithmetic widening not so bad.

  1. For most code, most of the time, on most archs, the size of your operand makes no big difference (or none at all!) for arithmetic speed. Especially for integer maths. Exceptions: Vector instructions; division; square root; math-lib stuff. See http://www.agner.org/optimize/instruction_tables.pdf.
  2. You generally want to use 32bit in order to save space in structs, fit more of them into memory / cache, save bandwidth.
  3. If you know precisely what you want, then it is no big deal to not never use any auto-promotion at all.
  4. Julia is quite predictable: Either there is an obvious promotion (UInt16+UInt8 = UInt16) or you end up with 64bit variants (Int16+UInt8=Int64, Int8/Int8=Float64) or big variants (BigInt/Int8 = BigFloat). Convert yourself instead of relying on auto-conversions if you want something else; if you don’t care then your code will become “accidentially type-convergent: everything 64 bit” all by itself.
  5. Exception: Some functions like sum (seriously, WTF?). Depending on your needs, write your own (<10 lines); I personally consider sum mostly unusable for non-64-bit types because I can never remember how they widen operands (Int8->Int32, Int16->Int32, Int32->Int64 on 0.62) (API solution: additional parameter no_widen=false; or get used to it if you can).

TL;DR: /(Int32, Int32) might not give the most sensible return-type for your situation, but you don’t need to read the manual in order to see what to expect; you should instead call Float32(a)/Float32(b) if that is what you want (or Float32(Float64(a) / Float64(b)) if this is about memory, not arithmetic throughput).


Ya this is what I ended up doing. I guess it’s not so bad once I get used to it and if I depend less on Julia’s type promotion, but it was certainly surprising enough that I thought it was a bug, and still kinda do actually. If there is a lack of support from core devs and the community to changing this behaviour, I guess I will just have to deal with it.


Me too! I’ll use sleep deprivation as an excuse for rushing straight to github rather than posting here or on gitter. My bad.

For me, that things don’t widen was intuitive – it’s just what I had assumed was supposed to happen, and I ended up spending longer debugging a GPU kernel than I would have liked.
Now that I know, I can see why it makes sense and its advantages (predictable once you know, and “accidentially type-convergent: everything 64 bit”, as foobar said). I think things are totally fine and easy to work with as is, it’s just silly how sure I felt that things were supposed to be different at the time.


This has changed on 0.7, now sum always returns Int/UInt for small integer types. reduce can be used to preserve the input type. See https://github.com/JuliaLang/julia/pull/20607 and https://github.com/JuliaLang/julia/pull/22825.


Cool, thanks!


I’d actually recommend calling Float32(a/b) instead, i.e. allow the calculation to be performed using Float64, or whatever a/b promotes to, remember, a and b could be things like Rational numbers, and you’ll may get a better answer by doing the conversion to Float32 for storage efficiency as the very last step.

Also remember, even 16-bit and 32-bit processors with HW floating point typically used 64-bit or 80-bit calculations, no matter what the size of the FP values that were loaded (back to the days of the 8087 co-processor in 1980).
These days, I think it’s pretty much only GPUs (or things like the TPU [TensorFlow Processing Unit]) that directly perform calculations on 16-bit FP values, and on CPUS, only SIMD instructions that will directly operate on 32-bit FP values.
Since (except for SIMD and GPU operations on vectors of Float16 or Float32) the calculations internally will be done with 64 (or 80) bits, you might as well use that in your code, which may help explain why Julia promotes to Float64 for fp calculations.


AFAIK julia only uses real 32bit float arithmetic (sse registers, xmm*) for Float32 on x86. This is important for me because otherwise moving a number from register to memory and back changes the result of the computation. I re-do small floating point computations sometimes if this is cheaper than looking up a stored result; and I rely on getting the exact same result every time (edit: yes, this means I must be very careful where to avoid @fastmath).


Simply moving a 32-bit value to a 64-bit register and back to 32-bit will not change it’s value.
That’s interesting, I wasn’t aware that Julia used the SIMD instructions (which don’t convert to the 80-bit format) to multiply scalar values. I haven’t been able to track down yet how Intel operates on the 32-bit FP values internally though, they may still get widened before the operation for a guard digit for rounding.


My “AFAIK” comes from https://discourse.julialang.org/t/psa-floating-point-arithmetic/8678/7, me never having seen other instructions emitted, and my code not crashing and burning so far :slight_smile:

The bad thing (that julia never does) would be to load 32bit float operands, extend them to 64 bit, operate via 64bit arithmetic, and then round them back to 32bit and store them into memory.

No, but seriously enjoy the link to the PHP bug I posted in the other thread. Sometimes using higher precision than requested is the devil (in this case we got a catastrophic Denial-of-Service affecting almost every PHP-site at the time, due to the 80bit floating point having a different rounding than the intended 64bit rounding; this was fixed by a volatile in the offending C code, forcing the compiler to put the register into memory between iterations; there was also something about the real culprit being the compiler-flags used by PHP and not the original author of the library, who understood his code perfectly well).

And I am pretty sure that Intel really does 32bit on the vector registers (they won’t tell us what they really do internally, but they will construct a plausible internal state that the CPU could have been in whenever we look, and this should be compatible with the computation having happened in 32 bit).


There is one more reason to use Float64 for /(::Int32, ::Int32), which is that while a Int32 value can be exactly represented as a Float64, it can’t be as a Float32, which to me (who’s very concerned about information loss) makes a lot of sense.