It’s a good point. Perhaps a better question might be whether there is an implementation in Julia that achieves the same purpose but better.
I think the ecosystem for doing distributed computation on very large data sets is not particularly well-developed in general. Spark can be really unintuitive and unpleasant to use, and other tools well-versed for large data (sort of like SAS) are even worse.
For a long time now I’ve wanted to build something to handle econometrics out-of-core for data too large to fit in memory in a distributed way. JuliaDB sort of handles out-of-core data, but I don’t really think it’s robust in the ways I might like for econometrics data. I’d have to really work with it to get it to handle my kinds of operations.
Another question is whether Julia is actually the correct language to build this stuff in. Yes, it’s fast and has good typing and does a whole bunch of stuff well, but it’s not really a good systems programming language like I imagine you would need to build a Spark competitor. Spark is a really good system, in that it’s quite robust and moderately easy to scale as long as you have good infrastructure. It would be a big undertaking to write something similar in Julia.
I would like to see something like it, where everything is intuitive and just works. We could probably get that out of something based in Julia, but it would be a huge effort. It’d need industrial buy-in and a lot of developer-hours.
I should note that we have a lot of the skeleton already: