This I do not agree with. Suppose you have a problem of “size” n. You have two options (roughly speaking), you can interpret your code and get time complexity cf(n) where c\sim10 and f(n) is some monotonically increasing function (usually at least linear, but it is frequently polynomial), or you could compile and get f(n)+\epsilon where \epsilon\sim100~\mbox{ms}. For the vast majority of applications it should be obvious that you want to compile. Not only does compiled code scale in a more reasonable way, but even if you re-run the same code more than once you no longer have the \epsilon. When I see people complain that something is “sluggish” because they had to wait for something to compile, it makes me want to scream (not saying that was the case in this thread, just a general comment). The end-user “experience” is efficient compiled code, not a completely inappropriate scaling behavior which is an artifact of a language that is excellent for it’s original purpose (scripting) in which compile time sometimes matters, but totally inappropriate for everything else. Also, my understanding is that the Julia devs have not been working on compilation efficiency much at all, because it’s something that can always be done later without breaking changes, so things will only approve. I think this is an excellent approach. (Sorry for my unnecessarily elaborate rant on this issue, but I know we are going to have problems with people unfavorably comparing Julia with R and Python because of compile times, as I’ve already heard a fair bit of this, and this has to be one of the ultimate face-palms in programming).
Anyway, it’s all a moot point here because as you’ve said, your data had 10^8 rows
.
This I do agree with. I don’t know what the core issue is if there even is one. Perhaps I’m just misinterpreting things. It’s been a while since I’ve done any real benchmarking on this myself.