Discussion about integer overflow

This is a problem I ran into where array sizes came from a binary exchange format using narrower integers (UInt32 in this case). Passing these directly to Mmap.mmap led to the allocation of a too small memory mapping in some cases which then resulted in a segfault or some other error.

The result was: Avoid potential integer overflow in Mmap.mmap by kkretschmer · Pull Request #41186 · JuliaLang/julia · GitHub

2 Likes

30 vs 3 mg is still a 10 fold difference and doing a simple PKPD simulation with both doses will give your a proportional change in the metrics, e.g AUC. Such an error would be obvious to the analyst before it is passed down even for QC to a colleague. While I understand the spirit of the discussion is to ensure the tools are failsafe, in my experience, being in Pharma too, source of errors are usually due to analyst or the assumptions that they make. As @mbauman and others pointed out, being regulatory compliant and following the process is something that is taken very seriously.

Speaking of R as regulatory compliant, that took its own sweet time, close to 12 years before the FDA/EMA started accepting it. But if one puts that aside, the fact of the matter is that regulatory bodies have no mandate on what tool to use, and they cannot as they have to stay unbiased. The regulatory concerns that are being mentioned in this discussion are mostly related to business process and business users of the tools.

6 Likes

Because of overflows, and getting rid of them with this idea would be a great option:

Doesn’t need to be.*

That was about changing the default, but the default type for integers doesn’t need to change, only the non-default option, so we do not have to wait for Julia 2.0 (only if we change the default later, what I’m not proposing now).

The question is what should replace the default integers, and I’m a bit conflicted, BigInt or SaferIntegers.jl. The argument for the former is that it’s already built into Julia, and it would give a great incentive to optimize it. I know it’s possible (even already done in some package?). Julia has a policy of not adding stuff Julia itself doesn’t need. That would have been an argument to not have BigInt and rationals in the first place… I would personally be ok with adding SaferIntegers.jl as a stdlib (and even removing BigInt if it’s chosen, moved to a package).

People might say you can already just use e.g. SaferIntegers.jl yourself and change the integer literals in the REPL as an option through a package available. But you have to opt into that from the REPL (or rely on package authors to change to an alternative type). I think we should also have the option for scripts, so that end-users can change the default without changing any code.

* What I have in mind, is that even if the default for literals would change, we could still use machine integers for stuff that matters e.g. array indexing (or even floats seems to work for JavaScript… I know V8 changes to integers behind the scenes). I’m not arguing for making this option fast right away, just available as a non-default option, that people would actually know about if looking at: julia --help

I have my own ideas how to make overflow checks fast, mostly by avoiding them in a lot of cases. Even a compromise of NOT checking for for them for additions only (the most important operation) would go a long way. It hardly ever is the cause of overflowing for 64-bit. That would be multiplying directly or e.g. exponentiation, that can be optimized.


It’s intriguing what can happen with overflows, already warned against (not exploited, yet?) in C (could hypothetically happen in pure Julia code, and even if it/Julia fixed, because if/since issue at lower level in libc/malloc/calloc not fixed, and could also even happen in (otherwise) safe Java) for (most of?) RTOS operating systems and more: Multiple RTOS (Update E) | CISA

EXECUTIVE SUMMARY

  • ATTENTION: Exploitable remotely/low attack complexity
  • Vendors: Multiple
  • Equipment: Multiple
  • Vulnerabilities: Integer Overflow or Wraparound

E.g. BadAlloc Vulnerability Affecting BlackBerry QNX RTOS | CISA

On August 17, 2021, BlackBerry publicly disclosed that its QNX Real Time Operating System (RTOS) is affected by a BadAlloc vulnerability—CVE-2021-22156. BadAlloc is a collection of vulnerabilities affecting multiple RTOSs and supporting libraries.[1] A remote attacker could exploit CVE-2021-22156 to cause a denial-of-service condition or execute arbitrary code on affected devices.[2]
[…]
CVE-2021-22156 is an integer overflow vulnerability affecting the calloc() function in the C runtime library of multiple BlackBerry QNX products.

It may be overblown as nuclear reactors shouldn’t have an internet connection?

1 Like

the one possible action item here is to add a warning for integer overflow for integer exponentiation. I’m pretty sure we could do it with a relatively small performance hit (10-20%) and is not an operation that is especially performance critical.

Edit: In fact @keno made a PR for this in 2017 https://github.com/JuliaLang/julia/pull/21600/files

8 Likes

Hey! Long time no see…

Discussion that I was trying to move away from and refocus multiple times into having the best of both worlds scenarios conversation, without much success unfortunately… :slight_smile:

It’s just challenging because Julia is not at that stage of development anymore. Lack of “success” here has less to do with the conversation itself and more about its feasibility. I can guarantee you that julia --check-overflow would be broken before it even gets to the REPL. It’d take significant effort to make this work, and does not have an appreciable impact on any of the compliance standards I’ve been working with.

3 Likes

Exactly my point, with all the checks that are technically possible and still took 12 years.

It’s that hard to implement? Well, it is the way it is, at the end the community of users will decide in which direction to go.

On the other hand:

R> x = 2^60 + 100; y = 2^60; sqrt(x-y)
[1] 0

R> x = 2^60 + 100.; y = 2^60; sqrt(x-y)
[1] 0

and in Julia

julia> x = 2^60 + 100; y = 2^60; sqrt(x-y)
10.0

julia> x = 2^60 + 100.; y = 2^60; sqrt(x-y)
0.0

An that one might be a problem hard to spot. I don’t think it should be controversial that there are many more ways in which Julia can fail than R simply because Julia does not check for overflows and R does.

R does NOT check for overflows afaik it just by default makes it hard to use integers

4 Likes

I would really like to focus on the best of both world scenarios if that is even possible, the back and forward about Julia fails but R fails too… I don’t think this is taking us anywhere.

Something that could help new users, and would have zero impact on run-time performance is to add a warning a parse-level for the literal 10 to literal integer powers that overflow. The warning would read something like:

“The expression 10^21 results in integer overflow. To avoid overflow, use 1e21 for a floating-point value, or big(10)^21 for a BigInt. Write (10)^21 to compute an Int64 without this warning.”

7 Likes

Exactly, I actually think this entire discussion is really fixable with a Linter

6 Likes

My point is that griping about the lack of success in a two-day conversation with lots of engagement and that has remained remarkably productive with even syntax suggestions and an escalation to triage from core developers is premature. These things take time, energy, and effort in addition to the consensus-building… and everyone’s prioritizations are different.

18 Likes

I don’t think it is just me; consider that every other language oriented to numerical analysis decided to do these checks. Is it possible that all of them were wrong? That their languages could be faster avoiding those checks and still have a “slight” improvement? It seems unlikely.

THERE ARE NO CHECKS! they just decided that you can by default only get floating point numbers! They took away options. You are arguing for example that people who do cryptography should have to jump through hoops to get integers and manipulate them. That’s unacceptable.

in R:

> typeof(2L)
[1] "integer"
> typeof(2L^65)
[1] "double"
> 

Julia is like 100x faster than R, that’s not slight improvement. You simply CAN NOT do problems in R that you can do in Julia

Consider that if Julia automatically promoted anything^2 to float you would have NO autodifferentiation or type stability or anything.

21 Likes

That’s simply not true. Numpy/Pandas is a great example here — although Python itself will auto-promote its integers to bignums as needed, the moment you move into the data analysis stacks you get the same overflow behaviors as Julia. So not only do they overflow like Julia, they do so inconsistently within the same language.

R and Matlab make it difficult to get integer behaviors in the first place and instead lean more heavily on floating point numbers — which are tricky in their own right and don’t behave like scientists typically expect, either.

15 Likes

Well, I would not include Python into the numerical analysis group any more that Java, or any other generic language. Point taken though.

Actually I am very glad to hear this Matt! Conversations among people having an honest intent to improve things can be very productive. ​I am very glad to see positive outcomes, thank you for sharing!

The conversation has been pretty lengthy and I think that even we might not agree on everything we all have shared our points of view as clearly as we could, so besides you I will also extend my thanks to all of you guys that engaged in the conversation with an honest intent of improving the language we all love, and specially so to the core developers for sharing their knowledge and experience with us.

To those seemingly troubled people using Discourse to vent their frustrations I wish they can find peace.

10 Likes

My first calculator was the 2nd generation HP21. The second was the HP 34C, which had the magical integrate and solve keys. At some point I found an article written by the professor who implemented the numerical algorithms used by these keys. One of the things that sticks out is that if a numerical algorithm is well understood, it is possible to make a valid problem where it will give the incorrect result.
I really appreciated this discussion, where some of the design philosophy behind the Julia implementations of algorithms has been discussed. And what the tradeoffs are of the various design decisions, not only in speed but in failure mechanisms .
As an engineer, the onus is on me to understand the tools that I use and ensure that the results are reasonable. This has been emphasized in this discussion, that testing is extremely important. If a roof truss system fails, it is the responsibility of the engineer who signed off on the drawings, not the responsibility of the person who wrote the FEA software that performed the roof truss calculations.
Personally I am extremely happy that the Julia core developers have been able to discuss their design decisions and show why they make sense and make Julia into a more versatile language than many of the other languages discussed. I am also glad that they did not make their design decisions based on being the same as all the other languages out there. I appreciate the rational approach. Keep up the good work!

14 Likes