Many low-level optimization work are mostly considered by compiler and system people instead of ML researchers. ML community has done really great work on some compiler optimizations, such as loop fusion, polyhedral optimization, scheduling language, m sparse computing and memory reuse. I think it would be really hard for non-experts or a general purpose compiler to defeat their work.
That wasnāt my point though? Julia does make use of that same low-level optimization work (e.g. LLVM, CUDA/CUDA libraries, Enzyme, etc.) and can make use of more of it (e.g. XLA).
What Iām saying is a lot of ML research follows this order:
- Make it work
- Make it fast enough
- Make it clean and/or actually fast
Oftentimes work will stop at step 1 because itās enough to get a paper out. Organizations with plenty of free compute can also brute-force step 2, so there goes the incentive not to use a Python framework.
Currently, Juliaās ML/DL ecosystem is at a disadvantage for 1 because there just isnāt as much code out there for people to pull on for their own purposes. Similar story for documentation and community experience. Weāre working on all 3 areas, but these things take time. Support for 2 was essentially non-existent up until very recently, and again not something thatāll happen overnight. Itās not an easy fight, but I do believe weāll get from āI could use thisā to āI want to use thisā for āmainstreamā ML research eventually.
AI, even ML, is more the deep learning (DL) or neural networks and I would like to know where weāre behind that really matters, do the right nr. 1. Weāre behind infrastructure-wise (only?) with scaling to really large [NLP] (neural) models, but Iām not sure it matters too much as the trend is working smarter not harder, in NLP and other neural networks.
The goal should be one-shot or few-shot learning, and probably smaller/different models.
DL may be a dead end, but I think more likely something else is needed, and DL can be part of an AI.
What I see we should be replicating (or I would like to know if Julia more ideal for), is e.g. non-DL:
A.
https://science.sciencemag.org/content/350/6266/1332
The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches.
B.
@DrChainsaw, while there is NAS-related, e.g.:
thereās also newer:: Neural Architecture Transfer
From the video: āNAT is consistently more efficient (3x-9x) than EfficientNet, across various datasets without losing accuracy.ā
Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. [ā¦]
we propose Neural Architecture Transfer (NAT) to overcome this limitation [ā¦]
A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets. We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings
https://arxiv.org/pdf/2105.09491.pdf
benchmarks show that Retentive R-CNN significantly outperforms state-of-the-art methods on overall performance among all settings as it can achieve competitive results on few-shot classes and does not degrade the base class performance at all. Our approach has demonstrated that the long desired never-forgetting learner is available in object detection.
https://arxiv.org/pdf/2006.10738.pdf
Table 5: Low-shot generation results. With only 100 (Obama, Grumpy cat, Panda), 160 (Cat), or 389 (Dog) training images, our method is on par with the transfer learning algorithms that are pre-trained with 70,000 images.
What I want to emphasize here is that while many ML researchers are not bothered by these low-level things and focus more on algorithm improvement, there are other researchers in compiler/system community working on DL optimization. Besides LLVM/CUDA, Python currently have TVM, MLIR and more mature compiler infrastructures. These DL compiler techniques can also be reused for more general numerical tasks, e.g. computer graphics.
Even if Julia will become mature and usable in ML one day, I wonder at that time whether Julia can still provider more (performance) advantages than disadvantages if programmers in python can already enjoy all kinds of these optimization things (not only for ML, but also for other tasks).
Itās hard to say because so many pieces are unstable or purely theoretical at this point.
For example, TVM/Relayās training support is incomplete and still quite buggy for any kind of non-trivial model. Moreover, TVM in general kind of hits the wrong part of the equation by focusing on superoptimizing programs at the expense of compile time, whereas I think your average ML researcher would prefer ājust enoughā optimization to not impede iteration times.
If we talk about more general frameworks like MLIR, then it becomes a question of what is a better frontend. Here Iād argue that Julia (or Rust, Swift, etc.) are much better suited than Python because itās been specifically designed to plug into an optimizing compiler (LLVM). If you look at the current state of Python bindings for MLIR-backed tech, theyāre essentially the same old āwrite some DSL with limited language semantics and send it off to a black box compilerā pattern. Issues around debugging and introspection are marginally better at best. Itās telling that something like LoopVectorization (which could use MLIR eventually) feels far more integrated than something like Numba (which has been experimenting with MLIR).
So all in all, I think that this
is already true! The question is where do we go from here? If the performance of interstitial āglue codeā becomes for important for ML research or if we see more of an emphasis on architectures that donāt match the āgenerate a big graph of linalg ops and send it off to a batch compilerā model, then Python is going to struggle. Regardless of what happens though, I think there is value in having more mature, competitive alternatives in the ML/DL space, and Julia currently is the most viable contender there.
This commit is probably interesting: