Why is Python, not Julia, still used for most state-of-the-art AI research?

anon56330260 · May 25, 2021, 5:02am

Many low-level optimization work are mostly considered by compiler and system people instead of ML researchers. ML community has done really great work on some compiler optimizations, such as loop fusion, polyhedral optimization, scheduling language, m sparse computing and memory reuse. I think it would be really hard for non-experts or a general purpose compiler to defeat their work.

ToucheSir · May 25, 2021, 4:10pm

That wasn’t my point though? Julia does make use of that same low-level optimization work (e.g. LLVM, CUDA/CUDA libraries, Enzyme, etc.) and can make use of more of it (e.g. XLA).

What I’m saying is a lot of ML research follows this order:

Make it work
Make it fast enough
Make it clean and/or actually fast

Oftentimes work will stop at step 1 because it’s enough to get a paper out. Organizations with plenty of free compute can also brute-force step 2, so there goes the incentive not to use a Python framework.

Currently, Julia’s ML/DL ecosystem is at a disadvantage for 1 because there just isn’t as much code out there for people to pull on for their own purposes. Similar story for documentation and community experience. We’re working on all 3 areas, but these things take time. Support for 2 was essentially non-existent up until very recently, and again not something that’ll happen overnight. It’s not an easy fight, but I do believe we’ll get from “I could use this” to “I want to use this” for “mainstream” ML research eventually.

Palli · May 25, 2021, 8:03pm

AI, even ML, is more the deep learning (DL) or neural networks and I would like to know where we’re behind that really matters, do the right nr. 1. We’re behind infrastructure-wise (only?) with scaling to really large [NLP] (neural) models, but I’m not sure it matters too much as the trend is working smarter not harder, in NLP and other neural networks.

The goal should be one-shot or few-shot learning, and probably smaller/different models.

DL may be a dead end, but I think more likely something else is needed, and DL can be part of an AI.

What I see we should be replicating (or I would like to know if Julia more ideal for), is e.g. non-DL:

A.
https://science.sciencemag.org/content/350/6266/1332

The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches.

B.
@DrChainsaw, while there is NAS-related, e.g.:

there’s also newer:: Neural Architecture Transfer
From the video: “NAT is consistently more efficient (3x-9x) than EfficientNet, across various datasets without losing accuracy.”

Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. […]
we propose Neural Architecture Transfer (NAT) to overcome this limitation […]
A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets. We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings

https://arxiv.org/pdf/2105.09491.pdf

benchmarks show that Retentive R-CNN significantly outperforms state-of-the-art methods on overall performance among all settings as it can achieve competitive results on few-shot classes and does not degrade the base class performance at all. Our approach has demonstrated that the long desired never-forgetting learner is available in object detection.

https://arxiv.org/pdf/2006.10738.pdf

Table 5: Low-shot generation results. With only 100 (Obama, Grumpy cat, Panda), 160 (Cat), or 389 (Dog) training images, our method is on par with the transfer learning algorithms that are pre-trained with 70,000 images.

anon56330260 · May 26, 2021, 1:18am

What I want to emphasize here is that while many ML researchers are not bothered by these low-level things and focus more on algorithm improvement, there are other researchers in compiler/system community working on DL optimization. Besides LLVM/CUDA, Python currently have TVM, MLIR and more mature compiler infrastructures. These DL compiler techniques can also be reused for more general numerical tasks, e.g. computer graphics.
Even if Julia will become mature and usable in ML one day, I wonder at that time whether Julia can still provider more (performance) advantages than disadvantages if programmers in python can already enjoy all kinds of these optimization things (not only for ML, but also for other tasks).

ToucheSir · May 27, 2021, 6:53pm

It’s hard to say because so many pieces are unstable or purely theoretical at this point.

For example, TVM/Relay’s training support is incomplete and still quite buggy for any kind of non-trivial model. Moreover, TVM in general kind of hits the wrong part of the equation by focusing on superoptimizing programs at the expense of compile time, whereas I think your average ML researcher would prefer “just enough” optimization to not impede iteration times.

If we talk about more general frameworks like MLIR, then it becomes a question of what is a better frontend. Here I’d argue that Julia (or Rust, Swift, etc.) are much better suited than Python because it’s been specifically designed to plug into an optimizing compiler (LLVM). If you look at the current state of Python bindings for MLIR-backed tech, they’re essentially the same old “write some DSL with limited language semantics and send it off to a black box compiler” pattern. Issues around debugging and introspection are marginally better at best. It’s telling that something like LoopVectorization (which could use MLIR eventually) feels far more integrated than something like Numba (which has been experimenting with MLIR).

So all in all, I think that this

is already true! The question is where do we go from here? If the performance of interstitial “glue code” becomes for important for ML research or if we see more of an emphasis on architectures that don’t match the “generate a big graph of linalg ops and send it off to a batch compiler” model, then Python is going to struggle. Regardless of what happens though, I think there is value in having more mature, competitive alternatives in the ML/DL space, and Julia currently is the most viable contender there.

Liso · August 14, 2021, 8:12am

This commit is probably interesting:

Topic		Replies	Views
Where does Julia provide the biggest benefits over other ML frameworks for research? Machine Learning	34	10479	September 16, 2019
Has Julia met your need for AI and ML? New to Julia	26	1763	October 7, 2024
Is Julia Falling Behind in Relevance? (because it's not used in LLM research?) Offtopic	69	7776	July 16, 2025
On Machine Learning and Programming Languages Machine Learning	48	8699	January 25, 2018
State of machine learning in Julia Machine Learning	60	65707	August 26, 2022

Why is Python, not Julia, still used for most state-of-the-art AI research?

Related topics