Although Julia is promoted as an excellent language for deep learning, I still don’t see any framework I could use in production or even in long-term research. Here are the options I considered in different periods of time:
MXNet.jl
MXNet.jl is a Julia interface to the core library written in C++ (and Python?). As any wrapper, MXNet uses borrowed data structures and doesn’t feel “native”, which in practise usually means that the library doen’t work well with other common libraries. But the biggest concern is user base - I don’t see much interest to MXNet in the community, nor there’s high pace of development.
TensorFlow.jl
I remember it being quite popular a year or two ago, but just like MXNet it seems to attract quite little interest nowadays. Also, even though it includes latest innovations from the TF 2.0, I’m not sure how complete is Julia API compared to the Python’s version.
Knet.jl
To my mind this project is one of the best examples of good programming style and software management in general. Knet keeps very good backward compatibility, has excellent documentation and shows high performance. Knet comes with its weirdnesses though - I still can’t get used to wrapping everything into KnetArray
and writing predict(w,x) = w[1]*x .+ w[2]
instead of predict(w,b,x) = w*x .+ b
. Maybe one day I will stop worrying and learn to love the API, but this day hasn’t come yet.
(There’s also more personal concern about automatic differentiation - after having designed 4 AD packages I have pretty strong opinion on how it should be done, and AutoGrad.jl doesn’t match these criteria.)
Flux.jl
Flux is the most frequently recommended framework for deep learning in Julia, howerver I don’t see it as practical for 2 reasons.
Firstly, Flux (and its underlying library NNlib) have extremely unstable API. Functions and types get removed or replaced without any deprecation or prior notice. I was the first to introduce conv2d()
to NNlib (actually borrowed from Knet), but after a couple of months they were rewritten into pure Julia and renamed to just conv
with new argument list (boom! my own code that depends on NNlib suddenly stopped working). 7 months ago API changed again - conv
got a new required argument ConvDims
. By the way, ConvDims
doesn’t have any docs on how to properly construct it, and if you try to follow comments in the source code, you will find that it already doesn’t work and you should use DenseConvDims
instead.
These instability spreads to Flux itself - with the latest Flux v0.8.3 large portion of Model Zoo is broken because of changed maxpool()
which now requires a new argument of type PoolDims
(and I’m still looking for a proper way to use it). I don’t want to blame anyone - eventully, this is you who create value - but please remember that keeping pace with the changes may be quite painful for someone not closely following the project.
Secondly, Flux is slow, and there’s very little activity to make it faster. For example, in one benchmark (Knet vs Flux etc) Flux was ~3 times slower than Knet. There’s a hope that Zygote - an upcoming AD engine - will fix it, but so far my experience is the opposite (here’s also more recent issue on performance regression compared to the current Tracker).
My latest ResNet-based siamese network in PyTorch took 3 days of training on cloud Tesla V100 to get first meaningful results. Using a framework that is 3-10 times slower is just impractical in such settings.
Please share your experience, suggestions and, if you have one, grand plan on development of deep learning infrastructure.