Discussion about integer overflow

  1. Number of overflow scenarios where the language returns either an error, or a value that not being true allows the user either to stop the calculation or handle the overflow appropriately (e.g. Inf)
  2. This should be straightforward.
  3. This one should be straightforward too I think.

That’s exactly how it works; there you have the pharmacometrician doing 2^64 and giving negative doses to patients. Thank you for taking it seriously Stefan.

That’s very true… However some tools will have it easier, in theory we could use an assembly language to design a clinical trial but that’s not going to happen.

I’m a chemist, and a lecturer, and have many friends working in industry. I have seen all kinds of absurd concentrations coming from calculations, most times those were done by hand. Never the error came from the overflow of a number in a calculator. Almost always they come from human error in input data, particularly with units.

I do think that a package, or an environment that forces a user to stick to certain rules and checks values and units for the user would be of great importance. But I really cannot think of integer overflow being one of the important checks (though it could be there as well), it feels like a problem not much connected to actual research and development problems of any product. Missing a dose by a factor of 10 can also kill a patient.

3 Likes

I work now in pharma but I’ve also work in the aerospace industry and they also have their own strict compliance protocols to attend.

As @mbauman mentioned is not really about the tool, if we build a Julia piece of analytics with very thorough checks in every input and in every calculation we could potentially demonstrate that no error can take place in that constraint environment as long as every package that our analytics uses guarantees the same.

However, let’s try now to place ourselves in the position of a Pharmacometrician, would we want to care mainly about the mathematics of our model knowing that the tools we’re using have all kind of numerical checks in place, or we rather worry not just about our models, but about low level computation problems?

An extreme case would be choosing between assembly and R. What language would we choose as Pharmacometricians? The one we are at risk at every operation or the one that has ever possible check in place and allows for us to focus in the formulas?

And if we choose assembly and something goes wrong, how will we answer the question “Why didn’t you use R instead?”

1 Like

If you’re into research absurd concentrations are not a problem, in research nothing is and we use whatever works best.

In the unfortunate case a patient dies we want to make sure we can explain our choices; speed of a language won’t make it.

Well, I would challenge you to find a single example, of the many thousand cases of bad research and development of a drug, where the problem was the precision of the arithmetics of the language used in a program. I think you are trying to push an hypothetical problem to an extreme that does not exist.

2 Likes

At this point, I think it’s worth putting on my other hat and say that both Pumas-AI and Julia Computing take compliance needs very seriously. I’m spending a large portion of my day working towards 21 CFR Part 11 and Annex 11 and SOC 2 and other GxP processes. Modular integer arithmetic is only one small part here and while you may disagree with this choice there are many other features of Julia that lend itself very well to these standards — first and foremost its reproducibility. If you’re interested in a compliant environment I encourage you to email compliance@pumas.ai or compliance@juliacomputing.com.

18 Likes

Maybe we won’t find any because they use SAS and R?

As a matter of fact I don’t disagree with the choice because I don’t want to lose Julia speed either, that’s why all the suggestions about --check-overflow options, so we have the best of both worlds.

Thank you, good to know!

2 Likes

How is any of this different from “it’s possible to call a non-stiff ODE solver on a stiff equation and get the wrong estimate of the concentration in 1hr” or “it’s possible to call an MCMC sampler and have it fail to converge” or “it’s possible to enter 1/(1+ exp(-x)) and have the user call this for a large negative x and the formula will give a wrong answer”

Doing things that are known to be numerically bad will be bad… It’s true in ANY language. R won’t keep you from getting the wrong answer if you apply function(x){1/(1+exp(-x))} to a sufficiently negative value (searching for a proper example).

12 Likes

Ok, this is easier to give clarity on:

  1. R’s integers may or may not overflow – user code may or may not overflow depending on whether it calls overflow-safe operations like + or overflow-unsafe operations like reading CSV files. (R can silently produce garbage if you read in data generated by a system that uses 64-bit integers; Julia will not corrupt such data.)
  2. Julia’s integers always overflow.
  3. R and Julia have almost the exact same floating point semantics for overflow. The only difference is that R uses NaN in a strange way that produces different results depending on compiler settings.

So I think the score is either even or slightly in Julia’s favor. But you seem to have a different understanding, which I’d like you to flesh out in some greater detail.

5 Likes

I am taking it quite seriously and this is a real error analysis based on modular arithmetic. Suppose someone were to write code that used Int values to compute a drug dosage. This is admittedly somewhat implausible, but lets suppose they do, since that’s where all of this started: with the supposition that someone might be using integer arithmetic to compute drug dosages and that might lead to fatally incorrect answers. Then the answer they get would be correct modulo 2^64, with representatives of the equivalence classes taken from the set of values -2^63:2^63-1. That means that if the true answer they should have gotten is in that range, then they will get the exact right answer. If the answer is out of that range, then they’ll get an incorrect answer (still correct in modular arithmetic). Thus, it’s impossible for modular integer behavior to result in an overdose, only an underdose. Of course an underdose could be fatal too, but if you know that valid range of doses are always in the range -2^63:2^63-1 then the answer cannot be incorrect, even if some intervening operations overflow or underflow. This is actually a much better guarantee than either floating-point or saturating arithmetic (like Matlab uses for integers) can provide—both of those can be arbitrarily wrong even when a correct answer would be in a very modest range.

9 Likes

and unless the technician is willing to grind the patient up and run them through a distillation column it’s unlikely they will act on a wrong result giving a negative dose

3 Likes

I love how you subtly imply that correlation may have something to do with causation :stuck_out_tongue:
We all know that frivolous usage of integer/fp arithmetic may lead to mathematically wrong answers (in any language), so I really don’t understand what’s the point. I was still on board when @tim.holy summarized the story. I’m lost by now in somehow eristic discussion that followed…

Anyway, my take away from the discussion is:

  1. --check-overflow flag would be nice to have, is possible, but requires real work = commitment of time, effort and resources of someone skillful and/or an engagement of an industrial partner.
  2. Rust has different operators when overflow is intended. I somehow like this solution. Maybe a small point to be remembered for julia 2.0 (which is still way below the horizon :wink: )
6 Likes

For those interested, John Regehr has written about fast overflow checking, concluding:

In summary, signed integer overflow checks in LLVM are fast but they get in the way of the optimizers, which haven’t yet been systematically taught to see through them or eliminate them. There’s plenty of low-hanging fruit, and hopefully we can get the overhead down to the point where people can turn on overflow checking in production codes without thinking too hard about the tradeoffs.

4 Likes

From a curious bystander: how compliant is Julia with regard to ISO/IEC 10967 - Wikipedia?

Wow you are so young and optimistic—it was more like 15 bits using assembly! To their credit, NASA also lost a Mars orbiter due to units mismatch.

6 Likes

Also, an Ariane 5 famously crashed due to 16 bit integer overflow.

5 Likes

One of the drawbacks of the default double approach is:

> x = 2^60 + 100; y = 2^60; sqrt(x-y)
[1] 0
> 
5 Likes