Why is Python, not Julia, still used for most state-of-the-art AI research?

Many low-level optimization work are mostly considered by compiler and system people instead of ML researchers. ML community has done really great work on some compiler optimizations, such as loop fusion, polyhedral optimization, scheduling language, m sparse computing and memory reuse. I think it would be really hard for non-experts or a general purpose compiler to defeat their work.

That wasnā€™t my point though? Julia does make use of that same low-level optimization work (e.g. LLVM, CUDA/CUDA libraries, Enzyme, etc.) and can make use of more of it (e.g. XLA).

What Iā€™m saying is a lot of ML research follows this order:

  1. Make it work
  2. Make it fast enough
  3. Make it clean and/or actually fast

Oftentimes work will stop at step 1 because itā€™s enough to get a paper out. Organizations with plenty of free compute can also brute-force step 2, so there goes the incentive not to use a Python framework.

Currently, Juliaā€™s ML/DL ecosystem is at a disadvantage for 1 because there just isnā€™t as much code out there for people to pull on for their own purposes. Similar story for documentation and community experience. Weā€™re working on all 3 areas, but these things take time. Support for 2 was essentially non-existent up until very recently, and again not something thatā€™ll happen overnight. Itā€™s not an easy fight, but I do believe weā€™ll get from ā€œI could use thisā€ to ā€œI want to use thisā€ for ā€œmainstreamā€ ML research eventually.

9 Likes

AI, even ML, is more the deep learning (DL) or neural networks and I would like to know where weā€™re behind that really matters, do the right nr. 1. Weā€™re behind infrastructure-wise (only?) with scaling to really large [NLP] (neural) models, but Iā€™m not sure it matters too much as the trend is working smarter not harder, in NLP and other neural networks.

The goal should be one-shot or few-shot learning, and probably smaller/different models.

DL may be a dead end, but I think more likely something else is needed, and DL can be part of an AI.

What I see we should be replicating (or I would like to know if Julia more ideal for), is e.g. non-DL:

A.
https://science.sciencemag.org/content/350/6266/1332

The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches.

B.
@DrChainsaw, while there is NAS-related, e.g.:

thereā€™s also newer:: Neural Architecture Transfer
From the video: ā€œNAT is consistently more efficient (3x-9x) than EfficientNet, across various datasets without losing accuracy.ā€

Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. [ā€¦]
we propose Neural Architecture Transfer (NAT) to overcome this limitation [ā€¦]
A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets. We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings

https://arxiv.org/pdf/2105.09491.pdf

benchmarks show that Retentive R-CNN significantly outperforms state-of-the-art methods on overall performance among all settings as it can achieve competitive results on few-shot classes and does not degrade the base class performance at all. Our approach has demonstrated that the long desired never-forgetting learner is available in object detection.

https://arxiv.org/pdf/2006.10738.pdf

Table 5: Low-shot generation results. With only 100 (Obama, Grumpy cat, Panda), 160 (Cat), or 389 (Dog) training images, our method is on par with the transfer learning algorithms that are pre-trained with 70,000 images.

4 Likes

What I want to emphasize here is that while many ML researchers are not bothered by these low-level things and focus more on algorithm improvement, there are other researchers in compiler/system community working on DL optimization. Besides LLVM/CUDA, Python currently have TVM, MLIR and more mature compiler infrastructures. These DL compiler techniques can also be reused for more general numerical tasks, e.g. computer graphics.
Even if Julia will become mature and usable in ML one day, I wonder at that time whether Julia can still provider more (performance) advantages than disadvantages if programmers in python can already enjoy all kinds of these optimization things (not only for ML, but also for other tasks).

Itā€™s hard to say because so many pieces are unstable or purely theoretical at this point.

For example, TVM/Relayā€™s training support is incomplete and still quite buggy for any kind of non-trivial model. Moreover, TVM in general kind of hits the wrong part of the equation by focusing on superoptimizing programs at the expense of compile time, whereas I think your average ML researcher would prefer ā€œjust enoughā€ optimization to not impede iteration times.

If we talk about more general frameworks like MLIR, then it becomes a question of what is a better frontend. Here Iā€™d argue that Julia (or Rust, Swift, etc.) are much better suited than Python because itā€™s been specifically designed to plug into an optimizing compiler (LLVM). If you look at the current state of Python bindings for MLIR-backed tech, theyā€™re essentially the same old ā€œwrite some DSL with limited language semantics and send it off to a black box compilerā€ pattern. Issues around debugging and introspection are marginally better at best. Itā€™s telling that something like LoopVectorization (which could use MLIR eventually) feels far more integrated than something like Numba (which has been experimenting with MLIR).

So all in all, I think that this

is already true! The question is where do we go from here? If the performance of interstitial ā€œglue codeā€ becomes for important for ML research or if we see more of an emphasis on architectures that donā€™t match the ā€œgenerate a big graph of linalg ops and send it off to a batch compilerā€ model, then Python is going to struggle. Regardless of what happens though, I think there is value in having more mature, competitive alternatives in the ML/DL space, and Julia currently is the most viable contender there.

3 Likes

This commit is probably interesting:

2 Likes