State of machine learning in Julia

There are some very interesting points brought up in this thread, and several points we are aware of as developers of the ecosystem. It is good that we are separating the concerns for “conventional” DL and SciML, since the challenges in the two fields are very different and can sometimes be at odds with each other.

For “conventional” DL, the good news is that while the ecosystem is maturing, performance issues are being rapidly addressed. The biggest missing feature is the ability to do full program analysis on the backwards pass, which projects such as AbstractInterpreter are all about. EscapeAnalysis.jl and others will let us actually optimise on the program, even if the program itself was generated with relatively low level IR. As Chris mentioned, one of the goals of Diffractor is also to make such tooling possible for us to use to produce better code, but the AD itself is not really interesting to conventional large transformers, and Diffractor is unlikely to affect training performance there. It will help us avoid unnecessary allocations etc. So in that way, it isn’t really Zygote which is to be blamed, rather a missing optimisation pass after the backwards pass is generated that we need to optimise it.

The other parts of the concern lie with documentation, available models, tutorials, benchmarks, data handling and distributed training. The story here is different and can actually be improved with many small steps. We have put together benchmarks to track performance at https://speed.fluxml.ai. Admittedly I am trying to push an update to the benchmarks so they update regularly with changes in the ecosystem, but the systems are in place and up and running. Documentation sharp edges and tutorials are always welcome! Having said that, it can be jarring to come from PyTorch/ TF and find that there aren’t as many helper utilities which seem to come up most often. In my view, more than API docs, we need to document usage patterns. This is something we should improve on. On the data handling subject, I agree we need to do better, its one of the areas where I feel Julia has a lot of potential. It is true that Julia packages don’t usually act as monoliths and therefore reaching out to tiny obscure packages for loading data seems daunting, but with projects such as DataSets.jl (see https://github.com/DhairyaLGandhi/ResNetImageNet.jl which combines it with Flux for distributed training alongside DaggerFlux), it is shown how flexible this can be, especially as we keep in mind the dP cases that Flux handles. I think what we need are motivating examples and higher level functions to bring these together. This is very different from how Julia packages usually work in terms of composability but may be worthwhile to point users to the different patterns they can use for different needs. Popular cases would involve loading and preprocessing images and textual data, for which we can have default implementations. We had a function Metalhead.preprocess to do exactly that, so its likely worthwhile to bring something like it back. The models in Metalhead have recently been updated, and we intend to host the pretrained weights again shortly too. This is delayed, but something that definitely is on the priority list. There are several people working on it to get it right. We also support loading transformers in via huggingface and Transformers.jl. We still need to write the code for more standard transformers. Any help on that front would be very dearly appreciated. The community is always forthcoming to those willing to extend the ecosystem. Having said that, there are several specialised pretrained models available including YOLO, as well as some pretrained transformers. On the more philosophical side; is Flux interested in “conventional” DL? Absolutely. Do we see Flux being used for production cases? Yes. There are known areas of improvement in this sphere in the larger ecosystem for sure, but work is ongoing.

For SciML, Chris mentioned several cases where our tooling is working towards. This also includes explicit parameter based models, which Optimisers.jl supports. However, incorrect gradients are not great. Please make sure to open relevant issues. We have set up several instances of reverse CI, to tackle testing the ecosystem better.

16 Likes