Starting a Deep Learning project, should I keep using Julia or jump to Python?



I fully switched from Matlab to Julia to do my machine learning research two years ago. I really like Julia and never look back since then. With Julia, my workflow is more productive than before.
But now, I will start a deep learning project. I need to write a code similar to Bidirectional LSTM - CRF, but with my own CRF-like implementation. I also need to customize the details of some layers implementation.

I found an example on how to write Bi-LSTM-CRF code in pytorch
which seems to be a good start for me.

I am also aware about deep learning packages in Julia: Flux.jl and Knet.jl. But I am not sure if those package fit well with my project. I cannot find any implementation of Bi-LSTM-CRF in those package.

Should I jump to Python and use PyTorch or keep using Julia? Any suggestion?


Well, Knet.jl doesn’t have to have it implemented. Knet simply accelerates user’s NN code using its type of GPU-based array and its autodifferentiation setup AutoGrad.jl. But the actual NN code for Knet NNs is simply Julia code. This means it can do any NN as long as you know how to mathematically define it. If you do know the mathematical definition, just implement that as the predict function, and it’ll work.

Flux.jl also lets you define your own layers (at least from what I remember in @MikeInnes’ videos). I’m not exactly sure how that’s done, but look up the videos.

TensorFlow.jl is similar in that it gives you the tools to build the NN, so you’ll be able to implement whatever you need.


MXnet.jl generally has more complete models (less toolboxy, more end-result type of NN package). Have you checked that?


I have been experimenting with TensorFlow under Julia and Knet. I have decided to use Knet at the moment, because I have found it more intuitive under Julia, although I am sure that it will incur more coding on my side, as TF is the current hype. What I particularly like on Knet is the ease with which you can extend the autograd by new operators and the fact that your data samples does not have to be of fixed size.

Denis, the author of Knet, has put a lot of models into examples section of the project, so I am sure you will be able to find something there.


Knet is very flexible and intuitive to use from a Julia programmer’s standpoint, it would be a good choice. I would personally like to see Knet evolve a more PyTorch-like, layer-based API, and we’ll be doing some work towards that over the summer.

One factor in Flux’s favour here is its first-class support for RNNs. If you want a custom LSTM-like layer it’s trivial, bi-RNNs are as easy as flip(rnn) (in theory). No one else can do this and eventually I’d like for Flux to become a no-brainer for NLP and sequence-like tasks for that reason. Unfortunately, I can’t yet recommend it for something this advanced, but I will keep everyone posted.


I’m one of the PyTorch developers (just recently starting to explore the Julia world) and I mostly want to endorse what’s been said :slight_smile: . The ecosystem for deep learning in Python is obviously much more mature, but Julia’s catching up quickly: MXNet.jl and TensorFlow.jl provide full-featured interfaces to their respective packages, Knet.jl aims to build a PyTorch-like define-by-run framework but isn’t as full-featured yet, and Flux.jl is the least far along but has the kind of magical Julian architecture that’s completely impossible in Python.

I’d recommend at least trying to get a CRF working in Knet based on the PyTorch tutorial, then dropping back to MXNet.jl or TensorFlow.jl if you think too much is missing. It’s the experience of people like you trying to build new models from scratch that will drive the Julia DL ecosystem forward.


Thanks all for your suggestions.

It seems that Julia has a bright future for programming deep learning.
I will study PyTorch’s BiLSTM-CRF code first, and try to implement that in Knet.jl


I just started a small learning project in Julia, and so far I’ve been really really happy with just writing out my model as a regular Julia function and using ReverseDiff.jl to compute its gradients. MLDataPattern.jl is also extremely helpful for basic data manipulation tasks (like batching and sampling). It’s awesome to be able to build a working machine learning architecture out of general-purpose tools like these. This approach wouldn’t work as well in Python, since it lacks Julia’s speed and multiple-dispatch magic.


I think one of the big advantages of Knet.jl at this point is its excellent documentation and examples.