Discussion about integer overflow

literal_pow also allows literal powers like x^2 to be type-stable if x is a dimensionful quantity ala Unitful.jl; since small literal powers are so common in numerical code, not allowing this makes it much more painful to write fast dimensionful code. (And if you are working with BigInt, it would be quite annoying if positive exponents suddenly started giving floating-point results, I think.)

Honestly, anyone using Julia for more than a few days should realize that Int != Float64, that the default integers are fixed-width machine integers, and that you write floating-point literals as 6.022e23 not 6.022 * 10^23. This is not unusual for compiled languages, and you can hardly claim that no one uses compiled languages for numerical work! It’s not unreasonable to expect anyone writing serious programs to have some notion of how computer arithmetic works.

Making Int^Int :: Float64 seems far too breaking to happen in Julia 1.x. Making it throw an OverflowError seems more justifiable.

24 Likes

I spent just shy of three decades, implementing, rearchitecting, designing new features, for a language / database heavily used in healthcare IT. (Over half the EHR (electronic health records) in the US are stored using this system - and that’s just counting the ones stored by one of the users of the language / database - it is also used by the VA and the DOD, and many other large healthcare companies - and not just in the US)
I know precisely what it means to have to have code that must be as well tested as possible, or else people could die.

My $0.02:
It does not make sense to try to override the meaning of Int/Int64 for the entire system, as others have pointed out, that would break many things that depend on the correct modulo arithmetic, and would make other things to slow to be usable.
Being able to change the meaning of integer (and floating point) literals on a per module or file basis, or via a macro, would be a very good thing to have in Julia.

There is one issue that would need to be addressed to achieve that - currently the parser is lossy - in that it converts floating point literals immediately into Float64, instead of leaving the original form available so it could be picked up by a macro, and say, converted to BigFloat, or Dec64 / Dec128, without any digits being lost before the macro sees the literal. The value returned would need to be some new Literal type, that could hold both the value as currently parsed, as well as the actual string (possibly a SubString) with the original literal characters - for example:
Literal(0.12345678901234568, "0.123456789012345678901234567890123")
This would need to be done in such a way that it was optional for Meta.parse (so as not to break existing programs), and have some way to be able to enable it for particular macros.

Another issue that hasn’t been discussed here, that is a problem with R, Matlab, Python, etc., as well as Julia (if you end up using Float64), is that binary floating point often has many issues, just as important as overflow (from my experience, more so, for healthcare applications when dealing with things like doses, or measurements, that are often with different decimal units (kg, g, mg, microgram, etc.).
Something simple like adding up a bunch of decimal values such as this:

julia> 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1
0.9999999999999999

can easily lead to problems in a program if one isn’t very careful (and it’s trickier, IMO, than making sure you check for and handle integer overflow).

However, Julia does have a very good solution to those issues.
Instead of using either Int or Float64, use the IEEE 754-2008 Decimal Floating Point formats, in particular, Dec64 or Dec128 (which gives you 34 digits).
If you load the DecFP.jl package, you can benchmark those, and see that 1) they take much less space than either BigInt or BigFloat, and 2) are faster than either of those.

I do realize that for some cases, performance requirements might mean that Int or Float64 will need to be used, and those will need to be reviewed for the issues above (as they would need to be in any language).

@viraltux I do hope you take a look at using DecFP.jl.

17 Likes

I’d go farther to say that it’s unreasonable to let anyone do anything of any consequence if they don’t understand computer arithmetic. Hence the whole “it’s not safe for regulated uses” thing is really a red herring. What’s not safe is to let unqualified people do the work.

7 Likes

In any sufficiently large system you are almost certain to end up with someone unqualified in a position they should not be in. Hence the rules and regulations. And even qualified people can have bad days.

7 Likes

I’ve seen tons of very serious programs written by people (often doctors by training) who don’t have any notion of how binary two’s complement arithmetic or binary floating point work. That did not make them unqualified to do the work (with their extensive domain knowlege, I feel they were more qualified than having some programmer who might understand C’s “int”, “long”, and “double”, but not the medical issues, write the applications).

They simply used a language (M/MUMPS/Caché ObjectScript) that did all arithmetic with decimal floating point values (InterSystems’ product used a 64-bit signed integer value with a 1 byte signed exponent,
so it could represent values up to (2^63-1) * 10^127 (and as small as 10^-128) (the language required actually required decimal arithmetic, because semantically, all values were strings - there were no types - the number format was strictly an internal optimization)
This was all designed long before there were any IEEE standards, either for binary floating point or decimal (it also avoided the problem of getting very different results if you tried to use the vendor’s binary floating point formats - a program running on a DEC machine might get different results than on an IBM one).

This shielded them from ever having to worry about conversions between decimal and binary representations, or overflow (you’d get an error on overflow), so that even doctors were able to write some incredibly powerful and innovative applications, and not run into most of the numerical problems that have been brought up in this discussion.

4 Likes

Do decimal floats solve the problem though?
They are only exact for addition, subtraction and, to some extent, multiplication of decimal fractions. One division by 3, or taking a square root, and there’s a roundoff error, just as it is in the binary representation.

If a floating-point value is an experimental measurement, then using binary vs decimal floats makes no difference since the measurement uncertainty is vastly larger than the roundoff error in both cases.

They solve the integer overflow problem that @viraltux had issues with (you’d get an error or infinity)
They also solve the most common problems that I’ve seen non-CS types run into.
They are also exact (excpt if the exponent gets to large, and you get an error) for division by powers of 10 (a very common operation when dealing with metric system units!)

A lot of the healthcare applications deal with things like dosages and measurements, where you’d have a lot of scaling by powers of 10 (kilo, centi, deci, milli, micro) when dealing with units, and you avoid any representation / rounding errors in those cases, as well as billing systems, where you have to deal with currencies, that usually have subunits of 100ths or 1000ths.
These programmers are already accustomed to the idea that some things are not representable with fixed precision decimal numbers (irrational numbers, division by anything but multiples of 2 and 5, …), if they’ve ever used a simple calculator.

Decimal floating point numbers are not a panacea, by any means, but for many of the healthcare IT use cases that I’ve seen over the years, they do seem like the best bet (and far better in this particular case than using BigInt or BigFloat).

One of the things that I do love about Julia, is that the programmer does have the freedom of using the types that best suit the particular use case, and (in general), having packages just work with whatever numeric type you wish to use, unlike R, Matlab, Python, etc, and libraries like BLAS, that force you to use “double” / “Float64” binary floating point types.

5 Likes

Very good point about using the best types for the job.
I work in HPC, where the ranking of systems is done using FP64 calculations. Along comes the AI generation and they can do their work in FP32 or recently TF32 on Nvidia GPUs.
As the future comes relentlessly towards us there will be more choosing of the appropriate type, rather then FP64 and deil take the hindmost.

ps. I am bursting to mentium Unums here. No hardware implementation yet, but I would imagine Julia would cope with it the day after any such thing is released.

2 Likes

EDIT: Floats there not needed or wanted, there (TL;DR status quo works, or use rationals).

Ok, but not wrong result? Is BigInt power (giving BigInt for positive only values) the only type-unstable function in the standard library?

julia> big(5) ^ -1
0.2000000000000000000000000000000000000000000000000000000000000000000000000000004

EDIT: When I think about it, BigInt power (in general, if we want to allow negative power) should give you a rational of BigInts back. All powers are basically rationals in general, but since they’re slow, we give Float64 as an approximation, just as with division. For positive powers, the denominator is 1, so we can, and maybe should, do away with giving any type of floats or rationals for Unsigned types. Since there’s no Unsigned type for BigInts(?) I think we’re forced to always give rationals there.

I realize the code for BigInt is completely different from the code for BigFloat. You could cast back from BigFloat (if we would choose it as a default) back to BigInt, at a great cost I believe. BigInt is already very slow, and I think not as optimized as it could be. But BigFloat probably even slower. Would it be possible to unify the two and make BigFloat not slower than BigInt in the special case it’s actually storing an integer?

I think we should go with float as default for integer power, we could keep the status quo for BigInt power (seems we need to, as (currently) BigFloat only has so much precision and wouldn’t work for large integers), at least for now. For DecFP power that should give you DecFP however, not Float64. Not really a problem nor type-unstable. Just like rationals should keep giving you rationals (checked as they still are).

You are getting confused because of the special behavior of literal powers:

julia> big(5)^-1
0.2000000000000000000000000000000000000000000000000000000000000000000000000000004

julia> n = -1; big(5)^n
ERROR: DomainError with -1:
`y` cannot be negative.
6 Likes

Oh, that again… thanks. Then we could make it work with rationals of BigInts as I explained in my edited post. Or keep status quo for BigInt (i.e. that’s not type-unstable, and I guess no function in the standard library as you’re careful; and BigFloat not needed, wanted, since negative power disallowed, or even if allowed).

Evidently though they weren’t doing any significant math. Like for example Pharmacokinetics models or linear regression or molecular dynamics or agent based modeling of interacting plant species or solutions of PDEs on irregular regions or developing new hashing functions for IPv6 addresses and port number combos to minimize congestion on Ethernet switch LAG groups.

Showing up at the house of the guys who do that stuff and telling them they all need to do all their math using decimal floats stored in strings so that eye surgeons can code medical billing apps in their language without needing to read https://0.30000000000000004.com/ should get you laughed out of that house. This is what happened in this thread.

If people without any computing background want to write medical billing and dosing apps they should perhaps choose MUMPS rather than the premiere language for high performance computing of serious mathematical models, where people who understand mathematics congregate specifically because it gives them full power to tell the computer EXACTLY what they want done.

Also people shouldn’t show up at machine shops and start tinkering with CNC lathes without some training in how they work… You can get hurt or break expensive tools. Doesn’t mean we should replace all CNC equipment with rubber Fisher price toys. It also doesn’t mean everyone needs to take a 6 week course in shop safety if all they want is a hammer and a power screwdriver (MUMPS)

The point is Julia is a power users tool where people who know what they are doing don’t have to fight with the tool to get it to do exactly what they want. Suddenly if 2^11 means a floating point number, and so does 3*7 and 9+5, that’s not quite so true anymore, because you have to fight with the tool to get integer arithmetic. Octave and R made those choices and then they aren’t suitable for designing hashing algorithms or anything requiring integer math.

2 Likes

To be clear, decimal floats do not need to be stored in strings. There is an IEEE decimal-floating point format using fixed-width binary-coded decimal, available in Julia via the DecFP.jl package. It is implemented only in software on the hardware supported by Julia, so it is slower than binary floating-point, but it’s still useful e.g. in financial applications where exact representation of human decimal inputs is required.

Note also that, contrary to the popular superstition that 1.0 + 1.0 == 2.0 ± ε, ordinary binary floats ala Float64 are perfectly exact for working with integers up to 2^53 == maxintfloat(Float64). 3.0 * 7.0 and 9.0 + 5.0 are doing exact integer arithmetic, which happens to be stored as Float64.

That being said, I don’t think we seriously want to make all integers default to floating-point storage (ala Javascript, Matlab, or R). Putting aside questions of range of exactness (53 vs 64 bits), computer hardware is optimized to perform indexing operations (e.g. memory access) using integers, and using floats everywhere would lead to a massive slowdown. People only don’t care about this in languages where loops are slow anyway.

Parsing decimal strings to Float64 and converting back to strings is actually lossless in Julia — as long as your decimal has < 14 digits, which should be true for any decimal that is worth representing exactly — because the float-printing algorithm (Ryu, formerly Grisu) prints the shortest decimal that rounds to the same binary-float value. See this discussion.

12 Likes

The implication is clearly that the listed languages are serious numerical analysis languages while Julia is not. Specifically because they ensure that the programmer won’t be surprised by results of computation. As we’ve seen, it takes some expertise to understand why the implied statement is false. I’ll add one more refutation, one that has been well understood and discussed for decades. Namely, Mathematica and Matlab are closed source. In any language, the generic case is that you cannot test all inputs that will arise. But in Julia, Python, and R, you still might be able to audit a coded algorithm to gain a higher level of confidence in correctness. A special function is just one example. With the closed-source languages you have no chance.

7 Likes

This was what I was referring to. This is like the polar opposite of Julia. Everything is semantically a string, vs everything is semantically exactly whatever type you told Julia to make it. Julia is the place where types REALLY MATTER ON PURPOSE.

Yes, the original quote was unfortunately a bit incendiary. But realize that @viraltux is excluding nearly all compiled languages (from Fortran to Fortress, or from Algol to ZPL) — even though languages like Matlab and R rely on compiled code from those non-numerical(?) languages for their core numerical operations. And for exclusively interpreted languages, this is not totally unreasonable.

Basically, if you are designing an interpreted language where loops are slow, and you are content for them to always be slow — because users will rely on libraries written in other languages for performance-critical tasks — it makes sense for integer literals to translate to some BigInt-like type that never overflows (ala Python), or perhaps to use floating-point storage for integer literals if your language is not intended as a general-purpose language (ala Matlab or R).

In compiled languages, that’s too much of a performance hit to accept, so virtually all compiled languages default to fixed-precision hardware integers that can overflow, and typically do not guarantee checked overflow except in certain debugging modes. And all serious numerical work ultimately relies on compiled languages for performance reasons. Julia is a performance-oriented compiled language designed to make it possible to write critical inner-loop code (rather than relying on opaque libraries), so it has to make that tradeoff as well.

It’s not a big tradeoff — programmers quickly get used to the idea that there is more than one numeric type for representing real numbers, and that you use a different type for loop counters and array indices than you do for numerical-analysis calculations — but it is a tradeoff.

We can recognize that this tradeoff can be a bit jarring for someone coming from R or Mathematica in their first few days of using Julia, and do what we can to be sympathetic and ease this transition (perhaps by throwing OverflowError from Int^Int), while at the same time being clear that this is the reasonable & commonplace choice Julia has made — to be a fast, general-purpose, compiled language — and it’s not going to change.

24 Likes

But this isn’t just about performance. It’s also about semantics. If you want a signed twos complement integer you write 37 in Julia, if you want a 64 bit IEEE Float you write 37.0 the two things mean different things and are supposed to. Saying things like we are “allowing incorrect arithmetic” is like a slap in the face. It’s like if someone has curled their hair, and you walk up to them and say “ugh your hair is so curly why don’t you get a straightening iron and fix it”. They went out of their way to make it that way!

Granted, Julia isn’t as beginner friendly as Excel. But should it be?

I was not able to follow the whole discussion in detail so I might be repeating something that already has been said, but I have used Julia every day for many years now. And here is my experience:

  • in data science computing is done mostly on floats;
  • if integers are used - most likely they represent some kind of factor so it does not matter if it overflows or for bitwise operations where actually you want overflow;
  • other than that integers are used for indexing; and here the only catch is that 32-bit Julia uses Int32 by default and one has constantly perform checks against this type in library code (Int64 so more than enough in these cases)
  • indeed in some application areas (like combinatorics) integers are used for computation and can overflow but then I always know upfront that I am at the risk of such a situation and either use BigInt or use safe addition etc.

This is just to share my experience from practical use of the language.

5 Likes