I will however respectfully disagree with this design decision though, I would say people doing analytics would much rather have an overflow error thrown by Julia than having incorrect results in their calculations.
How do those people feel about for loops being slow? Why is that relevant, you ask? Because for loops are implemented using integer arithmetic. You slow down integer arithmetic and you slow down the entire language.
I would say those people might expect that in a high level language intended for numerical analysis correct results are above performance and they might expect not to have to worry about C++ like considerations when handling numbers. I would also say that, if they want performance, they would also like to have a package like FastIntegers.jl (just an example) knowing the kind of problems they might run into if they use it.
However, if I understood you correctly, it seems that Integer arithmetic is so intertwine with Julia that any safekeeping would cause a general slowdown. I have a question if I may, would just throwing an error also slow down the entire language?
I am asking because there might be strong regulatory implications; people lives might depend on the results of some analytics.
yes. Throwing an error is a massive slowdown. To do so, you have to check overflow for every operation, and the possibility of throwing errors means that the compiler can’t optimize your code since things like vectorization rely on arithmetic not having side effects.
Also, note that Julia has a library SafeIntegers.jl which you can use if you want errors on overflow.
At this point I’ve been involved in a lot of discussions with customers using Julia in a variety of heavily regulated industries—finance, pharma, medicine, insurance, aviation, aerospace, etc. While they have many concerns, I can’t say that anyone in any of those industries has ever expressed concern about integer overflow as a regulatory issue. If it was a big issue, it would rule out the use of many languages in those industries, including Fortran, C, C++, Java and C#. Moreover since every fast language has integer overflow, even if you use Python, most of your computation is still happening in a fast language that has integers that overflow. So while I agree that in an ideal world we wouldn’t have to make this trade off, we live in a world where checking every integer operation is prohibitively costly, and if we want Julia to be a language in which you can write code that’s as fast as possible (which we do), this is the call we’re forced to make.
Regarding Python: the standard int type does not overflow and is thus comparable to BigInt in Julia. But it is also very slow.
If you want to speed up Python using Numpy, Numba or Cython machine-type integers are used which do overflow without warning.
Well, now you have found the first one, I work for Big Pharma and in some of my past projects closely so to the Pharmacokinetics and Pharmacodynamics crowd. These people need to have their models approved by regulatory bodies, how do you think they will react when they find out that Julia, by design, accepts 1 + 1 = 2.12 ?
They might ask the following question and rightly so, “wait a minute, you are telling me that my model can be mathematically correct but that I cannot expect mathematically correct results?” If now we mention Julia speed they only thing they are going to hear is how fast Julia fails.
I am using R language as an example because R is widely utilized in Pharma when regulations matter and this is one language they use to have their models approved.
Regulated industries are not regulated in every step of their way, for instance, I have also been involved in Biomedical Imaging projects, in those projects we could use anything we want, any language, any hardware, anything, and that’s because when it comes to research we don’t have any technological regulatory constraints.
However, if you need your research to be approved that’s a different world; for instance, even though R is accepted by regulatory bodies in Pharma is not all R, all versions and all packages that are accepted.
There is not one single language intended for numerical analysis that I know of (SAS, R, S, Matlab, Mathematica, Maxima, Octave, SPSS,… a few others), not one, that allows for incorrect arithmetic of the kind I showed you above. Not one except Julia now.
Obviously, these languages might all be using C++ or Fortran under the hood, but they all still guarantee that a correct formula returns correct results.
Stefan, I believe that when we try hard often we can find ways to have the best of both worlds…
For instance, how about implementing a flag for a safe Julia (e.g. julia --safe)? This way people worried about regulations could develop their models in the standard fast julia, but have them approved by running the very same models in julia --safe mode.
This would make Julia not only the fastest language, but the safest too since, in safe mode, all kind of fancy safety measures could be put in place with no concerns for speed… Just an idea.
This also shows that R is lying to you - ~1 + ~1 is only ~2 after all and not exactly 2.
Interestingly, I believe the majority of languages you’ve quoted do actually have the same “problem” - they just mask it by showing fewer decimals than would be required to accurately represent the true number. For example, both Matlab and Mathematica lie to you in the same way. See here for Matlab and here for Mathematica:
By default, the inputs 0.1 and 0.2 in the example are taken to have MachinePrecision. At a common MachinePrecision of 15.9546 digits, 0.1 + 0.2 actually has a [FullForm] of 0.30000000000000004, but is printed as 0.3.
Arguably, your example is hitting a different failure mode (integer overflow) than just floating point imprecision, so I guess that’s a point to be made? For that though, SaferIntegers.jl has been suggested and would show you the problem right away.
Unfortunately your proposed safe mode would actually change the semantics of Julia - the overflow behaviour of Int64 and friends is part of the language.
This is all documented. Ints should only be used for “small” integers (such as for counting things). For large or continuous quantities there are plenty of alternatives: floats; big ints; checked integers.