Why is Python, not Julia, still used for most state-of-the-art AI research?

This is a compelling argument, but I don’t think it holds in practice (yet) outside of niches like SciML. For better or worse, many ML research papers just don’t care about training time, instead choosing to focus on (theoretical) advantages or evaluation metrics while throwing more compute/data at the problem (case in point, anything that uses a TPU cluster). The minority of publications that do care either
a) overlap with SciML and neural diffeqs already,
b) are concerned with (device-side) inference (which Flux/Knet don’t address at all), or
c) are engineering-focused and written by authors with plenty of C++/CUDA experience.

Now, let me play devil’s advocate and address that blog post specifically. Here’s a question: how many of those points does JAX/XLA not handle already? The only one I can think of is custom kernels, but again the trend of current ML research is not to implement custom kernels. On the other hand, Flux’s TPU support is now completely out of date and automatic batching practically does not exist. Likewise, broadcast fusion is great but ML-specific optimizers like XLA can go beyond that and pull off even more aggressive optimizations.

Here, the obvious counterpoint is that Julia should outperform Python. However (I can’t speak for Knet), Flux is consistently slower than both PyTorch and TensorFlow on training and inference on common datasets (e.g. ImageNet).

Let me be clear that none of these points are an indictment of Julia the language, its potential or the community. The reality is that it took a lot of developer hours to get the Python DL frameworks to where they are now and the Julia ML/DL ecosystem hasn’t yet had nearly as much time poured into it. Will we get there? I hope so! Are we there now? Probably not.

8 Likes

It’s not off topic, at the most recent conference where I presented, there were a whole bunch of presenters using geometric algebra for neural networks. They weren’t using the Julia language though, I believe.

Geometric algebra is generalized multi-linear algebra, so of course it can be applied to anything that uses linear algebra, including AI or whatever.

4 Likes

Your whole post and this part is the best answer I’ve gotten so far:

In theory Julia and its packages will be as fast (I at least see no reason Julia should be slower than other languages, and I’ve seen it benchmark faster, it may have been with Knet; just not sure how realistic ones, at least weren’t done at scale, multi-GPU).

I’m not sure if Julia should be faster than alternatives for most the work I see. Yes, SciML can be 100x faster than alternatives, where it applies, I’m just not sure where that applies, if at all to mainstream ML.

As I said, I’m pinning my hopes on new algorithms/architecture, but the alternative is already down to 1-bit, so at that level Julia can probably not beat the speed others have with more clever datatypes. The algorithm complexity translates, but if done first on Python/C++ people may just stay there.

2 Likes

I downloaded yesterday some scripts of how taxes are computed that the French Tax ministry released (it is so complicated in France that you can’t compute or check the tax by yourself…)

It ended up it was a bunch of COBOL scripts (2020)…

3 Likes

I think the answer is very simple:

Python is a very simple language, with a lot of consolidated libraries in the area (Scikit-learn, Numpy, PyTorch, Keras, OpenCV .API …). The majority of pre-processing algorithms are also available in Python (as sci. The Julia equivalents are, obviously, not so complete yet. Also, many researchers in the AI area are reluctant to learn another programming language (Python is the easier).

Also, the performance topic is not important for many (when the training/evaluation takes a lot the performance of the rest of the system is not so important).

Another important is the spread of the language. In research you want to collaborate, and make your algorithm/technique available. Python is a lot more known than Julia, and you will have more influence making available your proposal in Python.

5 Likes

FYI: I see multi-GPU done (while I guess not Distributed, then) in 2018, with Flux:

Also interesting:

https://estadistika.github.io//julia/python/packages/knet/flux/tensorflow/machine-learning/deep-learning/2019/06/20/Deep-Learning-Exploring-High-Level-APIs-of-Knet.jl-and-Flux.jl-in-comparison-to-Tensorflow-Keras.html

On the Nvidia blog in 2017:

On average, the CUDAnative.jl ports perform identical to statically compiled CUDA C++ (the difference is ~2% in favor of CUDAnative.jl, excluding nn).

Incidentally, one of those countries is also a birthplace of Julia…

4 Likes

Nice finds!

Yup, I think this is possible on current CUDA.jl as well. However, multi-gpu training of the same model (whether on the same machine or across machines) requires functionality that isn’t implemented anywhere in the Flux ecosystem yet.

Seems like all of these benchmarks are prior to the Flux Zygote transition (I see lots of Tracker). Knet likely performs even better now (and should probably receive more love), but Tracker -> Zygote was a noticeable performance regression for certain workflows. ref. https://fluxml.ai/2020/06/29/acclerating-flux-torch.html, Flux vs pytorch cpu performance, https://github.com/FluxML/Flux.jl/issues/886.

For more holistic comparisons, see also
Is it a good time for a PyTorch developer to move to Julia? If so, Flux? Knet? and https://discourse.julialang.org/t/where-does-julia-provide-the-biggest-benefits-over-other-ml-frameworks-for-research (the latter was started by a PyTorch contributor).

1 Like

From your link, some research using Julia (and I also edited my top post, with state-of-the-art research from one of the main Knet/Julia guy):

Will Julia be as fast as optimized cpp + python in machine learning? Most machine learning core parts in cpp are written by experts so maybe they do not have the two-language problem?

If well written julia, then yes. In some cases it can be faster (If python-c interop creates bottlenecks). Also, even just using tensorflow or most other python deep learning libraries has a 2 language problem even if you’re just using python. python+tensorflow is 1000% uglier than pure python.

PyTorch has had more development and resources. Flux.jl is still quite early days.

Any opinions on Knet vs Pytorch?

2 Likes

Are there any companies using Flux and Knet?

i wonder too. must be, but I am not sure at what scale though.

I kind of feel like the two language problem is getting less important over time. Lower level languages are getting easier to use. Rust and Cpp with modules are very appealing.

2 Likes

As far as I know Invenia uses Flux in production.

6 Likes

This is the elephant in the room. PyTorch is really good for many (not all) AI research tasks because it is mature, stable, fast, flexible, has lots of online help, and has cool features like mixed-precision training and GPU parallelism.

I am backing the Julia horse but it’s a bet I expect to pay off over a few years once the Julia ML ecosystem matures. The answer to the question in the title is that the Python ML ecosystem is better right now for most use cases, plus some degree of inertia as others have mentioned. It doesn’t hurt our chances, or dismiss the amazing contributions thus far, to admit that Python has its charms.

11 Likes

I don’t know, maybe you can get all the important (speed) benefits of PyTorch with (without giving up any Julia benefits?):

EDIT: There’s already (available since at least April):

The short answer to the question below was “Yes and Flux”, I also link directly to a more detailed answer:

I’m not sure how good https://github.com/boathit/JuliaTorch
is. It’s a wrapper, but I tried to install in, and I see now it’s not yet a proper package (so, neither registered), so you have to git clone or download.

I guess it is/would be nice to have it (with easy installation), while I’m not so sure you would use Julia to its full potential (nor sure you could mix with Julia’s frameworks), so I think a migration to a Julia-only solution (e.g. maybe with other registered package above?) should be on people’s radar.

3 Likes

I think this is the easy answer yet not necessarily the right one or at least not the whole story.
PyTorch started at September 2016. I remember using version 0.3.x which was released less than 18 after the start.
Flux has started around ~May 2016 according to the first commit. Do you find it as usable as versions 0.3 or 0.4 of PyTorch?
My point isn’t the exact dates but that the resources spent on PyTorch in 18 month are probably something achievable in 4 years of Flux. So I’d expect the maturity and capabilities to be similar. Especially if, as claimed, Julia gives the developers the ability to be more effective due its features. Flux is also developed by very capable Julians, so one would expect they can take full advantage of its capabilities.

I think it has to do with the goals set to Flux. PyTorch seems to me as a pragmatic approach. Each step getting better. Flux seems to try making many steps forward at once. I might be wrong on the analysis but looking from the side it looked just like the Marshmallo Challenge.
I found the early Knet to be much more intuitive and “PyTorch” like. Unfortunately it doesn’t get enough of the spotlight. Moreover later it tried to be more Flux like. I wish it was just a PyTorch clone. Simple and straight to the point.

For my subjective idea I would also say Python, as someone mentioned here, is much easier language than Julia. Coming from MATLAB looking at packages of Julia seems to me as an act of wizard. IT has to do with the Julia community being built with highly capable programmers. Sometimes, to me, things seems too much elegant yet not simple.

I have never encountered MATLAB code I couldn’t understand (MATLAB, to me, the easiest language out there). I can handle some Python code though having less Python hours than Julia. Yet most code in Julia’s packages seems like black magic to me. It might be like to many other engineers being very good at what they do yet only as good as they need in programming. They find Python to be a more welcoming tool, not only more popular (It is popular due its simplicity).

4 Likes

PyTorch is largely porting the existing lua Torch package to python as I understand it. Torch development started in 2002, 10 years before even Julia was around much less Flux.

I’d be interested to see some examples of code that you find confusing in Julia packages. Coming from programming in python (and matlab before that), I found in general that my mental model of what a given piece of code is doing is simpler in julia than python which is itself more straightforward than matlab.

9 Likes