I have also switched from Pytorch.
Within a few years I think the strengths of Julia will place it far ahead of Pytorch and others:
-
Pytorch requires underlying code to be written in c++/cuda to get the needed performance, 10x as much code to write.
-
With Flux in particular, native data types can be used. This means that you can potentially take the gradient through some existing code (say a statistics routine) that was never intended for use with Flux. To do this with Pytorch would require re-coding the equivalent python to use
torch.xx
data structures and calls. The potential code base for Flux is already vastly larger than for Pytorch because of this. -
Metaprogramming. I think there is nothing like it in other languages, or definitely not in python. Nor C++. Among other things it allows creating domain specific languages e.g. JuMP and Turing I think are examples.
Multiple dispatch, unicode-latex variable names, other things are also beautiful, though in my opinion they give smaller productivity increases versus the 10x things mentioned above.
However, I did not find it effortless. There is a lot to learn, and Flux itself has changed rapidly – over the last year there was transition from the older Tracker (somewhat similar to Pytorch) to Zygote (which allows plain-datatypes as mentioned above). Some of the examples are not up to date, and I think even true for a bit of the documentation. It seems to be going fast however.
Also the Flux community seems (in my perception) to be mostly focused on differential equations, not so much on machine learning.
Because of the example+documentation problem, several people have recommended just doing github code search (extension:jl “using Flux” and sort by most recent) to see what fluent people are actually doing. This has been quite helpful.
Knet has a smaller community. It’s a partial tribute to Julia (as well as Knet+Flux authors) that these packages are potentially competitive with Pytorch, with probably 100x less person-work. As far as I know Knet’s autodiff is similar to Pytorch and does require the use of a custom array datatype, however standard operations can be used.