State of deep learning in Julia

It’s not entirely true that Flux’s API changes without warning: in Flux itself we do tend to deprecate things properly and give people time to upgrade. NNlib is a little different because it was originally designed with library use in mind, but if people are using it directly we can and should commit to more API stability. Communication is key here: if we know what issues people are running into we’ll fix those first, or at least help you figure out new APIs, add docs or deprecation warnings.

On performance: firstly, this kind of thing is really benchmark sensitive. For every microbenchmark that shows X there’s one that shows !X, and I could point to blog posts etc that find Flux much faster than TF or PyTorch for their use cases. On average all the tracing ADs have fairly similar performance IME (~1us overhead).

Obviously a big driver for Zygote is reducing AD overhead across the board. In my tests Zygote is ~10x less overhead than tracing ADs on a series of benchmarks I have (including convolutions, MLPs and RNNs). There are still performance bugs for sure, but if it was working perfectly in all cases we’d be releasing 1.0 rather than continuing the huge effort to develop it. In any case, turning one benchmark into a blanket statement and making out that there’s no effort going into these issues is pretty unreasonable.

39 Likes