Why is Python, not Julia, still used for most state-of-the-art AI research?

I know e.g. ALBERTA, a variant of BERT, has been recreated in Julia, and AlphaZero, but all the (recent) papers I read, I see are still based on Python code, if the language is known to me (e.g. GPT-2 and updated GPT-3). [SciML, merging neural networks and PDEs, maybe the the exception.] What (other) counterexamples are there?

Ok, often I’m not sure what language is used, e.g. for interesting paper from July:

and: Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

My hope was that, as we need, and are making newer algorithms all the time, obsoleting older ones, people would use Julia, and Python would simply get outdated that way.

Good blog:

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

the AI-generated blog’s popularity on tech-focused Hacker News forum as a measure that GPT-3 had managed to fool readers that a human had written the posts.

A post written by GPT-3 made it to the top of the Hacker News forum

The post has indeed received 198 points and 71 comments. “What most commenters didn’t realize: The post was generated entirely by artificial intelligence,” Business Insider wrote.

EDIT: I did find from the main guy behind Knet.jl (but the paper doesn’t mention if Julia or other language used):

Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. […]

1 Like

The content of the PyTorch repo sheds some light on this:

Screen Shot 2020-09-01 at 9.56.56 AM


From a thread about CSV:

It seems it’s common that in order to get a paper more valued, you have to use the current standard language of the field (i.e. Python).


You imply it’s really C++, fair enough, my main point was, why not Julia? And maybe you imply that combination is good enough, Julia not needed. I think most users use Python, without knowing or caring C++ powers it, not just in AI. I suppose (some of the) research people really do stuff in C++ though.

Also why not then use https://github.com/boathit/JuliaTorch wrapper, rather than Python directly and C++ indirectly?

I don’t see why this is a mystery. Everybody uses python because everybody uses python. It’s what they’re used to, and everyone they collaborate with uses it.

There’s also lots of sunken cost, lots of available tools, and changing language incurs a cost.

That’s how inertia works. It’s nothing to do with python or Julia, as languages.


E.g. binary networks, 1-bit weights and activations, seems like would have been easier to develop in Julia (than in Lua, so people to migrate from one language to newer), understandable they didn’t in 2016: https://arxiv.org/pdf/1602.02830.pdf

Torch7 and Theano framework […]
Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

In Python/PyTorch as late as 16 days ago:

This is the pytorch implementation of our paper “ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions”, published in ECCV 2020.

1 Like

I couldn’t agree more on this. People don’t care about the language, they just use whatever everybody around them is using already.


I do see people in the Julia community doing lots of research, mostly non-AI (and yes, SciML), and I know about the ML/AI Julia packages, the infrastructure. I suppose they could be used by now instead of e.g. Python/C++, and maybe people are using (just not everyone aware of, at least I’ve not seen any major AI original research done using Julia). I was thinking, do they currently have some limitations that make Python justified, or is it simply ignorance of the Julia alternatices and/or inertia?


I think the non-technical side of things has been summarized already, so let me add a couple of technical issues that Python frameworks still have the edge in:

  1. Distributed and multi-gpu training: this has unfortunately become a must for certain streams of research in CV, NLP and deep RL. CliMA is the only public project I know of that has distributed GPU support, but it’s not DL and the only way to do something similar for DL right now is to implement your own framework from scratch on top of MPI (or resuscitate NCCL.jl, but that’s even less likely).
  2. API coverage: for example, over half of my lab is MATLAB refugees who really only use Python for DL model training. They are not in a position to write custom kernels and rely on the provided
    kitchen sink to hack together new layers. You can see this browsing through the repos of many research papers: the publication promises a theoretical speedup over, say, quadratic self attention, but the actual implementation is an inefficient tangle of whatever torch.tensor ops were necessary to get things running.

Hmm, yes, multi-GPU was a blind spot to me, I see it done as far back as 2016, but not specific to ANNs/DL:

Is it for sure dead? This one or at JuliaGPU (both updated recently)? https://github.com/vchuravy/NCCL.jl

There’s interesting work being done to scale NNs down not just up (as with GPT-3), both for NLP and computer vision. Still, GPT-3 is huge (ALBERTA I mentioned much smaller), so multi-GPU seems needed for sure (at least for good NLP now).

I’m curious, if the network itself doesn’t need to be that big (say fits on one memory), but the problem is the dataset/training, what happens if you spit it 2 or N ways and train independently, can you in general (or say for images only) combine two such trained networks? Isn’t that what people call minibatching? I could see it maybe not working for NLP.

And can you simply use:

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet

E.g. Julia has by now (for a long time, while it’s not the post popular framework, for Julia or otherwise) official support for MXNet, and as I posted, there’s a PyTourch wrapper, while the Tensorflow one is a bit outdated.

There’s a bit more to it. Currently, out of a dozen or more colleagues, not one has any understanding whatsoever of what makes Julia compelling. If they have guess, it’s that Julia allows writing optimized numerical code, maybe some kind of modern C++. As Viral and others have pointed out, it seems very difficult to communicate the advantages laid out in this thread, say in slides or a blog post. Actually, I am thinking about how best to do a presentation about it now.
Having said that, there still may be reasons to stick with python. For example, your end users only use computers because they have to and have no interest in programming.
So inertia may explain it. But, for the most part, people have no idea what they are missing.
I do find that people who have experience in more expressive statically compiled languages are painfully aware of some of the limitations of python, which at least allows you to imagine that Julia might have something to offer.


Yes, because they don’t care about the language they use, be it R, Python or Julia.


Yes, but I doubt that for “state-of-the-art AI research”. I could be wrong. If you meant more generally, or outside AI, say web programming, then sure. Then I’m not sure Julia has a killer advantage. But for AI, and I really meant ML, then I thought Julia had such an advantage based on (I realize many ML people are doing not too advanced stuff, but I’m not thinking of those): https://julialang.org/blog/2018/12/ml-language-compiler/

Yes, you are correct. I was inadvertently broadening the topic. But, I’m not thinking of web frameworks, but rather other scientific computing. (But, I guess people who write web framework stuff in Julia, would have a similar story)

1 Like

Agreed. On top of this, evangelism is both difficult and expensive. The major Python frameworks have corporations with extensive name recognition and oodles of cash backing entire conferences, YouTube channels with highly-produced videos, blogs and more. Even big corporate backing is not a guarantee of breakout success: Swift for TensorFlow remains somewhat obscure despite the initial buzz.

With all that said, let me plug https://github.com/JuliaCommunity/ML-Coordination-Tracker, where we’re trying to identify and fill in gaps in the ecosystem. If you’re interested in a more grassroots approach for increasing adoption, feel free to open an issue or hop on Zulip :slight_smile:


In my field of geometric algebra, my Julia package has 100 more stars than the equivalent python package. However, pretty much all the open source collaborative effort is going the python package, and zero of the collaboration into the Julia package. So even though my Julia package is more (apparently) popular at this point, the python package already has an established community of contributors (a group of them, we all stay in contact on our communication server and share all our work accross languages). However, I don’t actually mind that the rest of my geometric algebra developer friends are focusing on the python language instead of Julia. We formed a unified geometric algebra community, which is independent of choice of language. There are a bunch of developers doing it in other languages also, but for this sake I focus on python vs Julia.

So, I think we can all get along, no matter what language they are using. In the geometric algebra community, I am happy there is such a diverse choice of languages, including Julia and Python. It’s great that there are people using python independently, because it’s a separate effort I can keep learning from.

I don’t try to convince my algebra friends to switch to Julia anymore, what language they use is up to them and their situation. Yea, I will recommend Julia, but I don’t mind if they choose to neglect the Julia language.

The python project is probably easier to contribute to for them because it doesn’t have obscure usages of metaprogramming features like in Julia, so it gives an easier entry point for a complicated project.

I am really good friends with the developers of the python variant of my geometric algebra package, and talk to them by phone fairly often too (about science, not programming, language agnostic). Our shared lanuage is mathematics.

Anyways, my point is that scientific collaboration should not put so much emphasis in the language choice. Instead, the most fruitful discussions happen if you create unified communities beyond only Julia. Let other people find their way to Julia on their own, whenever they are ready.


Your comment and my response seems off-topic (still interesting, and maybe relevant in general about collaboration), regarding AI, but I really like to know if, with me just seeing Hyperbolic Deep Learning idea/thread today.

I’m aware of you and your packages, and I must admit, they’re above my paygrade. I understand physics at a level I’m ok with, but suppose I need to look into e.g. Clifford (Grassmann, Lie?) algebra, if I really want to understand the physics I want to know more about (e.g. spinnors, twistors and Geometric Unity).

There are countries that still do not use the metric system, you figure it out.


CMU Common Lisp was available for x86 back in the mid 90’s It had a LOT of the advantages that Julia has: a good compiler, macros, multi-methods in CLOS, etc. Then the SBCL variant of CMUCL came along and it got more maintainable and improved further.

I mean, no question Julia has many advantages such as a syntax that people like, and probably a fancier compiler that can specialize things based on types without type annotations, but in 1999 you could write code that would blow Python out of the water, yet it never caught on in popular usage.

I think the answer is high end languages like Common Lisp and Julia are really for people with a high end view of computing and there are just far fewer of them than there are general people who do some coding.

What’s the advantage of macros if you don’t really understand what they mean and how they work? What’s the advantage of a compiler that can do all kinds of type inference so duck-typing is a thing, if you don’t understand why that’s a thing and you do concrete type annotations so everything has to be a Float64? etc etc.

A lot of what makes Julia so amazing is that it attracts amazing people who understand how Julia code could be made amazing.


This is a compelling argument, but I don’t think it holds in practice (yet) outside of niches like SciML. For better or worse, many ML research papers just don’t care about training time, instead choosing to focus on (theoretical) advantages or evaluation metrics while throwing more compute/data at the problem (case in point, anything that uses a TPU cluster). The minority of publications that do care either
a) overlap with SciML and neural diffeqs already,
b) are concerned with (device-side) inference (which Flux/Knet don’t address at all), or
c) are engineering-focused and written by authors with plenty of C++/CUDA experience.

Now, let me play devil’s advocate and address that blog post specifically. Here’s a question: how many of those points does JAX/XLA not handle already? The only one I can think of is custom kernels, but again the trend of current ML research is not to implement custom kernels. On the other hand, Flux’s TPU support is now completely out of date and automatic batching practically does not exist. Likewise, broadcast fusion is great but ML-specific optimizers like XLA can go beyond that and pull off even more aggressive optimizations.

Here, the obvious counterpoint is that Julia should outperform Python. However (I can’t speak for Knet), Flux is consistently slower than both PyTorch and TensorFlow on training and inference on common datasets (e.g. ImageNet).

Let me be clear that none of these points are an indictment of Julia the language, its potential or the community. The reality is that it took a lot of developer hours to get the Python DL frameworks to where they are now and the Julia ML/DL ecosystem hasn’t yet had nearly as much time poured into it. Will we get there? I hope so! Are we there now? Probably not.