Starting a Deep Learning project, should I keep using Julia or jump to Python?

I fully switched from Matlab to Julia to do my machine learning research two years ago. I really like Julia and never look back since then. With Julia, my workflow is more productive than before.
But now, I will start a deep learning project. I need to write a code similar to Bidirectional LSTM - CRF, but with my own CRF-like implementation. I also need to customize the details of some layers implementation.

I found an example on how to write Bi-LSTM-CRF code in pytorch
which seems to be a good start for me.

I am also aware about deep learning packages in Julia: Flux.jl and Knet.jl. But I am not sure if those package fit well with my project. I cannot find any implementation of Bi-LSTM-CRF in those package.

Should I jump to Python and use PyTorch or keep using Julia? Any suggestion?


Well, Knet.jl doesn’t have to have it implemented. Knet simply accelerates user’s NN code using its type of GPU-based array and its autodifferentiation setup AutoGrad.jl. But the actual NN code for Knet NNs is simply Julia code. This means it can do any NN as long as you know how to mathematically define it. If you do know the mathematical definition, just implement that as the predict function, and it’ll work.

Flux.jl also lets you define your own layers (at least from what I remember in @MikeInnes’ videos). I’m not exactly sure how that’s done, but look up the videos.

TensorFlow.jl is similar in that it gives you the tools to build the NN, so you’ll be able to implement whatever you need.


MXnet.jl generally has more complete models (less toolboxy, more end-result type of NN package). Have you checked that?

1 Like

I have been experimenting with TensorFlow under Julia and Knet. I have decided to use Knet at the moment, because I have found it more intuitive under Julia, although I am sure that it will incur more coding on my side, as TF is the current hype. What I particularly like on Knet is the ease with which you can extend the autograd by new operators and the fact that your data samples does not have to be of fixed size.

Denis, the author of Knet, has put a lot of models into examples section of the project, so I am sure you will be able to find something there.

Knet is very flexible and intuitive to use from a Julia programmer’s standpoint, it would be a good choice. I would personally like to see Knet evolve a more PyTorch-like, layer-based API, and we’ll be doing some work towards that over the summer.

One factor in Flux’s favour here is its first-class support for RNNs. If you want a custom LSTM-like layer it’s trivial, bi-RNNs are as easy as flip(rnn) (in theory). No one else can do this and eventually I’d like for Flux to become a no-brainer for NLP and sequence-like tasks for that reason. Unfortunately, I can’t yet recommend it for something this advanced, but I will keep everyone posted.


I’m one of the PyTorch developers (just recently starting to explore the Julia world) and I mostly want to endorse what’s been said :slight_smile: . The ecosystem for deep learning in Python is obviously much more mature, but Julia’s catching up quickly: MXNet.jl and TensorFlow.jl provide full-featured interfaces to their respective packages, Knet.jl aims to build a PyTorch-like define-by-run framework but isn’t as full-featured yet, and Flux.jl is the least far along but has the kind of magical Julian architecture that’s completely impossible in Python.

I’d recommend at least trying to get a CRF working in Knet based on the PyTorch tutorial, then dropping back to MXNet.jl or TensorFlow.jl if you think too much is missing. It’s the experience of people like you trying to build new models from scratch that will drive the Julia DL ecosystem forward.


Thanks all for your suggestions.

It seems that Julia has a bright future for programming deep learning.
I will study PyTorch’s BiLSTM-CRF code first, and try to implement that in Knet.jl


I just started a small learning project in Julia, and so far I’ve been really really happy with just writing out my model as a regular Julia function and using ReverseDiff.jl to compute its gradients. MLDataPattern.jl is also extremely helpful for basic data manipulation tasks (like batching and sampling). It’s awesome to be able to build a working machine learning architecture out of general-purpose tools like these. This approach wouldn’t work as well in Python, since it lacks Julia’s speed and multiple-dispatch magic.


I think one of the big advantages of Knet.jl at this point is its excellent documentation and examples.

1 Like

Just a word of caution. I have written huge codebase with Flux.jl, which we planned to use in production. But neither Tracker.jl nor Zygote.jl are working with CuArrays.jl. With Tracker.jl I get

ERROR: MethodError: no method matching CuArray{Float32,2}(::UndefInitializer, ::Tuple{Int64})

(with a very deep stacktrace), when attempting to calculate the gradient and Zygote.jl throws an error already during precompilation and stackoverflows during gradient calculation. So now I am really stuck and have to start over with Python. If I knew this from the beginning I would go for Python and would save a lot of time and energy…


Have you tried using either forwarddiff or reversediff?

I don’t think they are compatible with Flux.jl and GPU. Besides, ForwardDiff.jl would be prohibitively slow on my R^n -> R loss function.


(I guess)

Well, if you want all the details, the issues you referenced are relatively easy, and I was able to work around them (just did y |> cpu for onecold one and ignored the test failings). Then I was getting hit by ERROR: MethodError: Cannot convert an object of type Array{Float32,2} to an object of type Adjoint{Float32,Array{Float32,2}}.

I hunted hard for minimum reproducible example, but could not reduce it down to few lines. Asked about this error on Slack – no luck. Then I decided to file an issue about it on GitHub with the example I had. It was greatly simplified from the original, but still a few hundred lines of code. In the end, I worked around it by using permutedims instead of ' everywhere.

Next, I was getting hit by this bug, but not when using mapslices – when using Flux.gradient instead. Then I tried Zygote.gradient on the master branch of Zygote.jl got StackOverflow. Next, I tried the latest release of Zygote.jl – got ERROR: MethodError: _forward(::Zygote.Context, ::Type{Array}, ::Array{Float32,2}) is ambiguous. Asked about it on Slack and thankfully @MikeInnes, provided a quick fix for it, for which I am very grateful to him. After incorporating his fix I’m now getting ERROR: UndefVarError: S not defined Stacktrace: [1] show(::IOContext{REPL.Terminals.TTYTerminal}, ::Type{SYSTEM (REPL): caught exception of type UndefVarError while trying to handle a nested exception; giving up.

All of the last issues are above my abilities to work around or fix and have very deep stack traces. At this point, I have no choice but to give up and start everything over with PyTorch. I could file issues for those bugs, but again looks like my reproducible examples would be a few hundred lines at best after putting the all the effort to simplify my code, so I am hesitant to do that.


See Flux bidirectional LSTM at

Thanks everyone for the replies.

It has been two years ago since I started my first deep learning project. I decided to use PyTorch at that time. But, I left deep learning for a while to work on other projects.

I just recently work with deep learning again, and chose PyTorch as a starting point. But, eventually, I realized that it’s hard for me to be productive when using Python. I then switch back to Julia and write the code in Julia.

If anyone interested, here is the link to the repo. It’s about replacing the standard cross-entropy loss with the objective that aligns better with the evaluation metric we care about.