State of machine learning in Julia

I love Julia and it’s my first choice whenever possible. However, as soon as it comes to ANNs, I’m afraid I still turn to Python/PyTorch. I would prefer to use Julia instead, but figure it might be useful to share the downsides I’ve encountered for “mainstream” machine learning models.

Examples of state of art models are not readily available

For example, consider GPT-2 from 2018, ages ago in ML land :wink: . If one searches on google for “julia gpt-2” the first result is a blog post by someone named Julia describing a python implementation. I mention that partly in jest, but it’s an apt summary—even when explicitly looking for Julia implementations of SotA models, you’ll probably find a Python implementation first.

If you dig a little deeper, you’d find Transformers.jl, which does indeed have a GPT-2 implementation. But now try to find a Julia implementation of Visual Transformer, Longformer, Linformer, Compressive Transformer, RoBERTa, etc. One could readily find multiple example repos in Python, typically including reference implementations from the authors or a major project.

Reference models are often broken

For example, the reference VGG implementation for Flux.jl was broken for around two years when run on Nvidia GPUs until mid-2021. This is slightly unfair as there was a ton of changes in Flux in this period, including the transition to Zygote. Partly, I think it’s a reflection that there’s still a lot of research and experimentation as to the best way to do AD in Julia. Hopefully, this will continue to standardize around best practices and get robust over time.

Memory usage and speed are worse

I don’t want to dwell on this too much because my impressions may be out of date. I’ve tried porting over some large ResNet-style convolutional neural networks / VAEs to Flux that operate on giant 3D movies. I haven’t been able to run equivalently sized models with Flux vs PyTorch, although perhaps this has changed recently.

Performance benchmarks are hard to come by

A number of folks have made some really nice benchmarks to compare Flux / PyTorch / Tensorflow, but I’m not aware of any that are regularly maintained. So even as someone that’s eager to use Julia for ML since all my other code is in Julia already, it’s hard for me to assess if the ecosystem can meet my research needs without diving in to code up a benchmark.

Some useful recent benchmarks include:

https://github.com/avik-pal/DeepLearningBenchmarks/tree/update (Feb 2020; Flux within 0.5-1x of PyTorch for common layers)
Why is flux model slower than python? (Jan 2021; Flux 0.5x of PyTorch for VGG19)
Julia slowdown on long running programs with many allocations (June 2021; Issues with memory growth over time; solvable by calling garbage collector)
Allocation of Memory while evaluate a model (Nov 2021; Memory is allocated each time a model is run)

I think these highlight common pain points for a PyTorch developer that would consider switching to Flux.

There has and continues to be major, major progress in the Julia ML ecosystem, and there is a ton of cool stuff that can practically only be done in Julia (I’m looking at you SciML!). And it’s clear that Julia has an awesome / arguably the best skeleton for ANNs: Where we are headed and why it looks a lot like Julia (but not exactly like Julia) - compiler - PyTorch Dev Discussions. My sense is it’s largely a matter of having enough dev time. Google and Facebook have a ton of engineers working on their frameworks, and the Flux team has been disproportionately productive all things considered. I would think that companies allocating more resources to the ecosystem could really accelerate adoption.

23 Likes