This is a compelling argument, but I don’t think it holds in practice (yet) outside of niches like SciML. For better or worse, many ML research papers just don’t care about training time, instead choosing to focus on (theoretical) advantages or evaluation metrics while throwing more compute/data at the problem (case in point, anything that uses a TPU cluster). The minority of publications that do care either
a) overlap with SciML and neural diffeqs already,
b) are concerned with (device-side) inference (which Flux/Knet don’t address at all), or
c) are engineering-focused and written by authors with plenty of C++/CUDA experience.
Now, let me play devil’s advocate and address that blog post specifically. Here’s a question: how many of those points does JAX/XLA not handle already? The only one I can think of is custom kernels, but again the trend of current ML research is not to implement custom kernels. On the other hand, Flux’s TPU support is now completely out of date and automatic batching practically does not exist. Likewise, broadcast fusion is great but ML-specific optimizers like XLA can go beyond that and pull off even more aggressive optimizations.
Here, the obvious counterpoint is that Julia should outperform Python. However (I can’t speak for Knet), Flux is consistently slower than both PyTorch and TensorFlow on training and inference on common datasets (e.g. ImageNet).
Let me be clear that none of these points are an indictment of Julia the language, its potential or the community. The reality is that it took a lot of developer hours to get the Python DL frameworks to where they are now and the Julia ML/DL ecosystem hasn’t yet had nearly as much time poured into it. Will we get there? I hope so! Are we there now? Probably not.
It’s not off topic, at the most recent conference where I presented, there were a whole bunch of presenters using geometric algebra for neural networks. They weren’t using the Julia language though, I believe.
Geometric algebra is generalized multi-linear algebra, so of course it can be applied to anything that uses linear algebra, including AI or whatever.
Your whole post and this part is the best answer I’ve gotten so far:
In theory Julia and its packages will be as fast (I at least see no reason Julia should be slower than other languages, and I’ve seen it benchmark faster, it may have been with Knet; just not sure how realistic ones, at least weren’t done at scale, multi-GPU).
I’m not sure if Julia should be faster than alternatives for most the work I see. Yes, SciML can be 100x faster than alternatives, where it applies, I’m just not sure where that applies, if at all to mainstream ML.
As I said, I’m pinning my hopes on new algorithms/architecture, but the alternative is already down to 1-bit, so at that level Julia can probably not beat the speed others have with more clever datatypes. The algorithm complexity translates, but if done first on Python/C++ people may just stay there.
Python is a very simple language, with a lot of consolidated libraries in the area (Scikit-learn, Numpy, PyTorch, Keras, OpenCV .API …). The majority of pre-processing algorithms are also available in Python (as sci. The Julia equivalents are, obviously, not so complete yet. Also, many researchers in the AI area are reluctant to learn another programming language (Python is the easier).
Also, the performance topic is not important for many (when the training/evaluation takes a lot the performance of the rest of the system is not so important).
Another important is the spread of the language. In research you want to collaborate, and make your algorithm/technique available. Python is a lot more known than Julia, and you will have more influence making available your proposal in Python.
Yup, I think this is possible on current CUDA.jl as well. However, multi-gpu training of the same model (whether on the same machine or across machines) requires functionality that isn’t implemented anywhere in the Flux ecosystem yet.
If well written julia, then yes. In some cases it can be faster (If python-c interop creates bottlenecks). Also, even just using tensorflow or most other python deep learning libraries has a 2 language problem even if you’re just using python. python+tensorflow is 1000% uglier than pure python.
This is the elephant in the room. PyTorch is really good for many (not all) AI research tasks because it is mature, stable, fast, flexible, has lots of online help, and has cool features like mixed-precision training and GPU parallelism.
I am backing the Julia horse but it’s a bet I expect to pay off over a few years once the Julia ML ecosystem matures. The answer to the question in the title is that the Python ML ecosystem is better right now for most use cases, plus some degree of inertia as others have mentioned. It doesn’t hurt our chances, or dismiss the amazing contributions thus far, to admit that Python has its charms.
I don’t know, maybe you can get all the important (speed) benefits of PyTorch with (without giving up any Julia benefits?):
EDIT: There’s already (available since at least April):
The short answer to the question below was “Yes and Flux”, I also link directly to a more detailed answer:
I’m not sure how good https://github.com/boathit/JuliaTorch
is. It’s a wrapper, but I tried to install in, and I see now it’s not yet a proper package (so, neither registered), so you have to git clone or download.
I guess it is/would be nice to have it (with easy installation), while I’m not so sure you would use Julia to its full potential (nor sure you could mix with Julia’s frameworks), so I think a migration to a Julia-only solution (e.g. maybe with other registered package above?) should be on people’s radar.
I think this is the easy answer yet not necessarily the right one or at least not the whole story. PyTorch started at September 2016. I remember using version 0.3.x which was released less than 18 after the start. Flux has started around ~May 2016 according to the first commit. Do you find it as usable as versions 0.3 or 0.4 of PyTorch?
My point isn’t the exact dates but that the resources spent on PyTorch in 18 month are probably something achievable in 4 years of Flux. So I’d expect the maturity and capabilities to be similar. Especially if, as claimed, Julia gives the developers the ability to be more effective due its features. Flux is also developed by very capable Julians, so one would expect they can take full advantage of its capabilities.
I think it has to do with the goals set to Flux. PyTorch seems to me as a pragmatic approach. Each step getting better. Flux seems to try making many steps forward at once. I might be wrong on the analysis but looking from the side it looked just like the Marshmallo Challenge.
I found the early Knet to be much more intuitive and “PyTorch” like. Unfortunately it doesn’t get enough of the spotlight. Moreover later it tried to be more Flux like. I wish it was just a PyTorch clone. Simple and straight to the point.
For my subjective idea I would also say Python, as someone mentioned here, is much easier language than Julia. Coming from MATLAB looking at packages of Julia seems to me as an act of wizard. IT has to do with the Julia community being built with highly capable programmers. Sometimes, to me, things seems too much elegant yet not simple.
I have never encountered MATLAB code I couldn’t understand (MATLAB, to me, the easiest language out there). I can handle some Python code though having less Python hours than Julia. Yet most code in Julia’s packages seems like black magic to me. It might be like to many other engineers being very good at what they do yet only as good as they need in programming. They find Python to be a more welcoming tool, not only more popular (It is popular due its simplicity).
PyTorch is largely porting the existing lua Torch package to python as I understand it. Torch development started in 2002, 10 years before even Julia was around much less Flux.
I’d be interested to see some examples of code that you find confusing in Julia packages. Coming from programming in python (and matlab before that), I found in general that my mental model of what a given piece of code is doing is simpler in julia than python which is itself more straightforward than matlab.