Hello JuliaLang Community!
I am a social scientist making his way from the Python ecosystem (mostly numpy & pandas) to Julia, and have in general been delighted by the language – concise, generally quite readable, and performance without the need to mix in numba and Cython code all the time. But I have discovered one aspect of Julia that makes me exceedingly uncomfortable, both as a researcher and an instructor: the lack of int overflow checking.
As a researcher, my view is that a good language should (a) always give me the right answer, and (b) do so quickly. But crucially, speed should never come at the cost of accuracy, and I think the lack of int overflow checking violates that principle, and introduces two very distinct problems.
The main problem with a lack of int overflow checking, in my view, is that it means I don’t know that I can trust third-party packages. Int overflows occur because of an interaction between the data being used and the operations being executed. In my own code, I can control both, and thus manage the problem. But the idea of packages is that I don’t know all the details of how everything’s being implemented, and at the same time, the package writer can’t know all the details of the data I’m working with. As such, I don’t know how I can possibly be sure that int overflows aren’t happening somewhere in packages I use. And if I can’t trust that, I don’t know how I can trust Julia.
In software development, this isn’t necessarily a problem – if the program works, the program works. But in scientific computing, it’s entirely possible for code to run and generate answers that are just wrong in non-obvious ways. After all, if we knew what the result of our calculations should be in advance, we wouldn’t be doing them.
My second concern with the lack of int overflows stems from my experience as an instructor. The beauty of Julia is that it offers C-like performance to users who only need to understand Python-like syntax, basic type concepts & type stability, and a few basic principles (put everything in a function).
Right now, I think this community is full of C programmers and early adopters who are accustomed to thinking about things like integer overflows, but as Julia grows, the language will have more and more users who don’t even know what integer overflow means (indeed, this has been brought up elsewhere). Having that giant pitfall – one that may silently result in Julia returning answers that are just wrong with no warnings – feels like a huge disservice to the community the language could potential support.
A last thought: I imagine that one reason int overflows aren’t checked by default is that 2^63 is pretty big, so this is not super relevant. In general computing, I think this is true, but I think scientific computer is precisely the contexts in which people work with weird numbers (both mathematicians doing simulations and people working with exceptionally large data). Moreover, I know I often downcast my data into Int32 or Int16 to conserve memory, which is often the constraint with the datasets I work with (and I imagine this may be true for many Julia users).
Indeed, looks like this comes up not just for downcast data, but for people working on 32bit machines and in situations where Julia deliberately uses Int32
I will not pretend to have the expertise to speak to the size of the performance hit, and I’m sure it’s non-trivial, though from what I’ve read maybe not that bad?. But at the same time, for a language written for research, it seems like being right should always be the first criterion, and only after that criterion has been met should be be optimizing speed.
Moreover, I’m all for offering un-checked integer operations as an option. But by making it a non-default, hopefully when people invoke it they will do so deliberately and with some serious thought as to what problems the decision may cause.