I really have to push back on this. There are so many ways that R makes it easy to inadvertently write incorrect code and not get any indication that anything is wrong. The designers have chosen over and over again to give anything that’s syntactically valid some meaning for the sake of occasional convenience even if that meaning is quite surprising most of the time.
For example, what happens if you accidentally do vectorized operations on two vectors that don’t have the same length? Think you get an error? You do not. R “recycles” the shorter one for you, hiding your programming error. This was such a footgun that they finally added a warning if the length of the longer one isn’t a multiple of the length of the shorter one, but this still throws no error:
> c(1, 2, 3, 4) + c(5, 6)
[1] 6 8 8 10
This is even worse if you do logical indexing because then you don’t get a warning even if the length of the index vector doesn’t divide the length of the vector being indexed:
> v = c(1:5)
> x = c(T, F, T)
> v[x]
[1] 1 3 4
Look at this example very carefully to understand why 4 is in the result vector. What if you’re computing an index and you accidentally get the sign of the index wrong? Let’s say you write i <- a-b
instead of i <- b-a
somewhere. Suppose i
was supposed to be 2 but ends up being -2 instead. Then you use that index to index into a vector. Do you think you get an error? Nope. You get all the elements of the vector except the one at index 2! Fun. Of course, there’s some code where that will throw an error, but far from all code:
> v = c(5, 4)
> i = -2 # should have been 2
> v / v[i]
[1] 1.0 0.8
Oops, the correct answer was c(1.25, 1.00)
. Same shape, wrong values, so you’ll be none the wiser that you made a mistake. This example combines the last two “let’s not bother the user, let’s just guess what they meant” gotchas into one silent failure. Here’s another fun one:
> x <- 10.2 * 100
> x
[1] 1020
> as.integer(x)
[1] 1019
Famously, when doing a join on factors in data frames, the join is done on the internal integer representation of the factors in each of the data frames. Why is that bad? Because there’s no reason those numbers would have anything to do with each other: the same string in each data frame will typically have a totally different factor index. That’s right: the join isn’t done on the string values of the factors but the indices that happen to be used to represent them internally. So the result will look reasonable but be complete and utter nonsense. Why is this famous? Because this feature pissed Wes McKinney off so much that he ditched R entirely and created Pandas.
All of these gotcha examples don’t even get into R’s unusual and surprising scoping and evaluation semantics. If you want a really great rundown of those, I cannot recommend Jan Vitek’s talk on trying to make R run fast highly enough. (A talk about strange programming language features and optimizing compilers? It doesn’t get any better than that for me.) The money quote from this talk is
I haven’t seen a language that is as hard to optimize as R. Somebody should give the R designer that particular prize as probably the hardest language to optimize, at least of the ones I’ve encountered and played with. Compared to R, JavaScript is a beauty of elegance and good design.
But the real question is why is R so hard to optimize? The answer is that it has features that make it basically impossible for a compiler to analyze code and understand what it does. And if a compiler can’t understand what R code does, then what chance do humans have?
Now I’m sure some people will read this and accuse me of bashing R, and to some extent I am—but only because of this repeated insistence that R is clearly much more reliable than Julia. But like Jan right after that quote, I want to acknowledge that R is an incredibly useful and successful language and one that I’ve used myself to great effect. It does, however, have a deeply YOLO design philosophy: when presented with situations where the user might have meant different things, rather than raise an error and ask them be more explicit about what they meant, R will invariably pick some behavior because it might be convenient some of the time. If you keep doing that when designing a language, you are not going to end up with a language that can be used to write reliable programs.
In contrast, Julia takes a very different stance in such situations: if you don’t know what the user intended, then don’t guess—raise an error and make them choose. That is a fundamental philosophical stance of the language. Libraries may do their own thing, of course, but I think we’ve managed to impart this philosophy to the ecosystem pretty successfully. Julia may not throw errors at compile time, but we don’t play around with guessing what someone meant when it’s ambiguous.
To bring this back to the subject at hand: yes, Julia has a bit of an ambiguity in that we haven’t forced people to be explicit about whether they intend for integer operations to wrap or not. But it’s also really easy to avoid integer overflow on 64-bit systems—basically every example where it happens involves ^
. It’s also the case that since native integers do modular arithmetic, there are strong guarantees on the correctness of integer results: if the correct result is in the range of values that Int
can represent, then the answer you get will be exactly correct, even if there is intervening overflow or underflow (which is pretty cool!). That means that unless you are computing astronomically large values, you’ll be fine. And if you’re doing that, use floats. So yes, this is a bit of an R-like YOLO choice, but it’s a very small, contained one. Especially when compared with how hard it is to be sure that numerical computations using floating-point are correct.