Is it a good time for a PyTorch developer to move to Julia? If so, Flux? Knet?

For me, the move from Keras to PyTorch was the best decision in my AI career. I am considering moving from Python to Julia because of performance and because the “NumPy” is built-in (as it was in Matlab rest its soul).
However, I must know if there is a solid AI framework. Some of my critical must-haves are:

  • Ability to stop in the debugger, inside the model, in the loss function…
  • An IDE (or plugin to existing IDE) that does auto-complete, allows to observer variables in debug
  • Classes to efficiently load data for AI training (think about PyTorch’s classes: DataLoader, Dataset, Sampler… that allow super efficient parellized samples loading)

Which of Flux/Knet frameworks is more Pytorch-like? can you give some insight?


Hi! I’m not qualified to comment on the ML frameworks per se, but I can tell you about the IDE/debugger situation and you can decide for yourself if Julia’s current capabilities satisfy you. Tl;dr yes, there are sufficiently good debuggers and IDEs (yes, debuggers, you read that right), but there are important differences between Julia and Python which necessitate a somewhat different mindset.

First, an answer to your question about Debugger/IDE

There are two main IDEs in Julia, which are really plugins for Atom and VS Code. The plugin for Atom is called Juno; the plugin for VS Code is called Julia for VS Code. They’re both really good and have nice autocomplete options, but if you want debugger support in the IDE, you want Juno. If you’re like me, and you end up using two windows, one with your text editor and one with your Julia REPL, as your “IDE,” any text editor with autocomplete will do.

There are also two different (main) debuggers that you’ll want to learn about, because each one has different use cases. Debugger.jl is your best bet if you want to step through functions, but it runs Julia code through the interpreter instead of compiling it, and if you need to figure out why your ML model starts diverging after training for 30 epochs on 10 million data points, that’s going to be too slow for you. If you need to debug but also compile your code for speed, you’ll need Infiltrator.jl, which will allow you to observe everything that’s going on at the breakpoints you set, but will only enter debug mode at the breakpoints that you set before running your code. I believe the integrated debugger in Juno is based on Debugger.jl, but I’m not 100% sure.

Second, some differences between Julia and Python

I want to point out something else that may be difficult to deal with if you’re coming from Python (it was for me!). Julia code is 100% compiled before execution, unless using the aforementioned interpreter, which you almost never do unless you’re debugging. Because of various issues that you may learn more about later, it’s difficult to store compiled code between Julia sessions (possible, but difficult).

The major upshot of this is that your basic workflow is probably going to be very different than it is in Python. If you have a script you want to run, say with julia myscript.jl, and you want it to analyze some data and make a plot, which is a pretty common thing to do in Python, Julia is going to have to compile your entire script plus all the functions you called from the plotting package and other packages, every time you run your script. This is a pathological case for running Julia code quickly. It may not make much difference if your ML models are big enough (i.e. they more than several seconds to run), but it will be very noticeable when trying to do quick data-analysis-type stuff.

The main way to deal with this is a Julia package called Revise which essentially hot-reloads changes you save to your Julia code. Essentially everybody here uses Revise. Learn to love it. :smiley: (I can comment more on how to use Revise if you’re interested, although everybody has their own way to do it.)

Another major difference (some might say, the major difference) between Julia and Python, is that Julia doesn’t have classes. Forget typing object.method(other_object); In Julia, it’s always method(object, other_object). This is because of something called multiple dispatch, which turns out to be really, really, powerful. I think it’s not an exaggeration to say that many of the really dedicated Julia users come for performance and stay for multiple dispatch.

Hopefully that’s helpful, and hopefully you find Julia to your liking!


It feels like Julia’s ecosystem is starting to steamroll and we are seeing many of these tools emerge from the community. Some exist (i.e. Juno as an IDE, Debugger.jl as a debugger which I believe is wrapped into Juno, Flux’s Dataloader for a possible avenue of loading data although there might be more mature/better tested tools out there).

I think there are a few things to keep in mind (which @doomphoenix-qxz mentioned). 1. is that you don’t need to buy into a single framework (like Pytorch or Tensorflow). You can often pick and choose packages that work best for you and the interop w/ a Deep Learning package should be easy. 2. The change will also take a rather different mindset than what you are used to in OOP, but I can say embracing multiple-dispatch has significantly improved my code design/productivity over C++ and Python.

For new projects Flux and Knet are both descent ANN packages. I use Flux as a daily driver for a majority of my research, and have few problems (albeit I mostly use small/straightforward models). People have been generally recommending Flux, but I think you would need to make this decision based on project needs.

You might also want to checkout MLJ which is a machine learning ecosystem built in Julia (kinda like SciKitLearn).

For other tools you might want to check out this handy page in Flux’s documents: The Julia Ecosystem · Flux. It details a bunch of packages which you might find useful.


I think the speed of pytorch is okay? Maybe state-of-the-art? :thinking:

Julia is amazing and has the potential to be much better, easier to use and fast than pytorch. It can already do things that pytorch cannot, however I’d say the ecosystem about DL is not mature enough to give you the smooth experience you’re used to. Unless you want to dive in and help workout bugs, missing kernels, memory issues etc I would check back in after a couple of months. Once it hits this level of usability for the average user, I think it will take off.

However, if you are doing a lot with custom GPU kernels (possible in pure julia), scalar operations (much faster) and neural ODE type of things (faster and more vibrant) , Julia is already far ahead.


@Alon welcome!

Yes and Flux.
Be aware that some things still need to be polished.
I’ve personally found the benefits of Julia outweigh the costs.

The best way for you to answer the question is to take it for a test drive.
If I were you, I would download Julia, and the IDE Juno (which has auto-complete & debugger).
Then work through some Flux examples.

@Alon what do you currently use PyTorch for? Images?


doomphoenix-qxz amazing overview! thanks!
mkschleg Ratingulate Albert_Zevelev thanks so much for the insights!
I think that you can definitely add Amazing Community to the benefits of Julia.

Yeah, I’ve been doing Images (medical images) for the past few years. Unfortunately, I’m not in a position to contribute bugfixes and features at the moment so I’m looking at Julia from a regular user perspective. However, having all those potential performance benefits + IDE + debugger + a good start for a DL framework is more than enough to start some nice side project with Julia, and who knows where it will go from there :wink:


I have also switched from Pytorch.

Within a few years I think the strengths of Julia will place it far ahead of Pytorch and others:

  • Pytorch requires underlying code to be written in c++/cuda to get the needed performance, 10x as much code to write.

  • With Flux in particular, native data types can be used. This means that you can potentially take the gradient through some existing code (say a statistics routine) that was never intended for use with Flux. To do this with Pytorch would require re-coding the equivalent python to use torch.xx data structures and calls. The potential code base for Flux is already vastly larger than for Pytorch because of this.

  • Metaprogramming. I think there is nothing like it in other languages, or definitely not in python. Nor C++. Among other things it allows creating domain specific languages e.g. JuMP and Turing I think are examples.

Multiple dispatch, unicode-latex variable names, other things are also beautiful, though in my opinion they give smaller productivity increases versus the 10x things mentioned above.

However, I did not find it effortless. There is a lot to learn, and Flux itself has changed rapidly – over the last year there was transition from the older Tracker (somewhat similar to Pytorch) to Zygote (which allows plain-datatypes as mentioned above). Some of the examples are not up to date, and I think even true for a bit of the documentation. It seems to be going fast however.

Also the Flux community seems (in my perception) to be mostly focused on differential equations, not so much on machine learning.

Because of the example+documentation problem, several people have recommended just doing github code search (extension:jl “using Flux” and sort by most recent) to see what fluent people are actually doing. This has been quite helpful.

Knet has a smaller community. It’s a partial tribute to Julia (as well as Knet+Flux authors) that these packages are potentially competitive with Pytorch, with probably 100x less person-work. As far as I know Knet’s autodiff is similar to Pytorch and does require the use of a custom array datatype, however standard operations can be used.


Hello and welcome!
I have been using Knet for the past year or so for training conv nets and I’m happy with it. It might not be as mature as Pytorch (for example it was missing the skyp connection layer, but i rolled my own quite easily) but it gets the job done. One thing where both packages lag behind a little is the ability to load ONNX models(don’t get me wrong, there are Julia packages for this, but you won’t always be able to load any model you want, most probably because of some funky layer that was not implemented yet in Knet or Flux). In that situation you might need to write your network and load data by hand, or, if you’re not in a hurry, contribute to the packages to make them better.
Back to Knet. Since it’s user base is smaller, you will find fewer answers and tutorials by google search. I recommend the documentation and the examples from github( I come from Matlab and was used to only search on google, now my mindset changed and I dig inside github repos: I usually find what I need and, as a bonus, get to look at the implementation). Yes, you won’t get answers for specific questions that fast, but the benefits are greater from my point of view.
About the debugger. I mostly use Juno with Atom I can use the graphical interface(step in, step over etc). You have a little tickbox where you can switch between interpreted mode and compiled mode(faster, but will stop in breakpoints only in the current function). If you are a bit patient you will be able to “step” your way even into the backpropagation pass of Knet, see how gradients are taken, how the optimizer updates those matrices. It’s cool! But this takes a bit of time to get accustomed to, until you understand which expressions to skip and which you should step into. I find it nice for learning purposes, but you will not need this in your day to day work since the core is quite stable.
About what you can achieve(or more like, what I achieved in my one year learning session about conv nets with sporadic efforts): I managed to roll my own networks for traffic signs classification, augment data, train on gpus. Using an already available yolov2 code I’m working onto extending this to realtime detection. And yes, I managed to train this network on my local gpu with Knet.
To summarize, there will be cases where it’s not all copy-paste-run, but the community and the learning benefits outweigh these inconveniences( valid for Knet, but also for Flux)


We added a debugger in the IDE to the VS Code extension recently as well! See here.

The main underlying debugging engine in Julia right now is JuliaInterpreter.jl. We then have three different front-ends: Juno, the VS Code extension and the REPL Debugger.jl. The three front-ends are independent of each other.

While it is great how much progress we have made with debugging in Julia, I do think it is important to point out that this is an area that is still very rough. If you come from Python and are used to some of the excellent debuggers there, then just be warned that none of the options in Julialand right now will give you an experience that is as smooth and polished and fast.


I’d just like to add that in the very short term (until some tracing and compiler stuff gets done), its you’ll often find that normal GPU heavy models will be slower, with CPU ones being faster. This is mostly just due to memory management.


Do you mean that:

  1. Models on GPU will be slower than on CPU, or
  2. Models on GPU in Julia will be slower than the same models on GPU in PyTorch, while models on CPU in Julia will be faster than models on CPU in PyTorch?

Also, can you elaborate on “some tracing and compiler stuff gets done”? I have some features pending improved performance of code compilation / tracing, so any news on that front are appreciated.

As someone who is also doing healthcare/medical research, these are early but exciting times in the Julia space! @dilumaluthge recently announced, so maybe something like MetalHead.jl or the Flux model zoo is in order.

WRT frameworks, you may want to check out @dhairyagandhi96’s Torch.jl. It’s not the full PyTorch API, but as I understand it should (eventually) allow you to use many of the kernels libtorch exposes.


Didn’t know that, but that’s excellent! Keep up the good work :smiley:

It’s not either or, see DiffEqFlux.jl: “Neural Ordinary Differential Equation (ODE)? […] This looks similar in structure to a ResNet, one of the most successful image processing models.”

The Neural Ordinary Differential Equations paper has attracted significant attention even before it was awarded one of the Best Papers of NeurIPS 2018. […]

What do differential equations have to do with machine learning?

1 Like

Yes that paper (Neural Ordinary Differential Equations) was very important and innovative. If someone put it on a list of the 20 most innovative ideas in the history of deep learning, I would not say that is out of place. It has not had much practical impact yet, but in come cases new ideas take some time to spread.

But it seems to be the one exception - among the 20000 (guess) papers published each year, there are tens or perhaps even 100 that are also important and innovative, but receive no attention here.


This estadistika Exploring-High-Level-APIs of Knet, Flux, Keras is a good side-by-side comparison, with favorable performance comparison


Please note that the model used for benchmarking is quite tiny:

The model that we are going to use is a Multilayer Perceptron with the following architecture: 4 neurons for the input layer, 10 neurons for the hidden layer, and 3 neurons for the output layer.

Most likely timing for TF is comprised mostly from initialization overhead.


Absolutely… it’s not a fair comparison. However the review in general is awesome and I love the triple-implementation in keras,knet,flux. only thing missing is Pytorch :wink:


@Alon Pytorch can be added.
QuantEcon has:
A numerical cheatsheet comparing: Matlab-Python-Julia.
A statistics cheatsheet comparing: STATA-Pandas-R.
Potential users (like you) could benefit from a deep learning cheatsheet comparing:
Knet-Flux-Keras-Pytorch …
This could have a nice place in the Flux readme?