I’ve read a bunch of the blogs on the Julia website (e.g. on machine learning and programming languages), but I remain unconvinced about what large benefits Julia provides over PyTorch.
For example, one section in the blog post posits Julia as something that allows for the usability of PyTorch without the Python interpreter overhead. While that’s great for inference use-cases, I think the results have shown that researchers don’t care about the negligible Python overhead compared to the overhead of actually writing the model.
Looking at the Flux.jl github page, I don’t see many places where it differentiates itself from PyTorch. Much of the “unusual architectures” section seems to positioning Julia as an alternative to TensorFlow, but I don’t see the benefit compared to PyTorch. The example about the sigmoid is easily handled by any kind of fuser (like with the PyTorch JIT).
Being able to write new operators in Julia doesn’t particularly convince me either. In order to get solid performance you need to go beyond just writing your naive loops in C++/Cuda, and you need a system more like Halide/TVM/PlaidML.
I do see an advantage in writing Julia if you truly need fundamental data structures (trees/maps/w.e.) embedded in the core of your machine learning system. However, A. I don’t particularly see the need for that now, B. I feel that any approach like that will face massive performance difficulties.
I think that perhaps I don’t understand Julia’s pitch well enough, or that much of the posts I’ve been reading have been focused on differentiating Julia from TensorFlow and not PyTorch. Could someone help me out?
Originally posted on Slack - redirected here.
PS: Also, to be clear, I have similar questions about Swift.
I don’t know much about machine learning and I know even less about the differences between PyTorch and Flux, but I think one thing to understand is that even if Flux provided no additional benefit to end users relative to PyTorch it could still be a very important and worthwhile piece of software. The reason is it is much easier for developers to work on the internals of Flux than PyTorch, so there is a much lower barrier to entry for machine learning experts to contribute to Flux than PyTorch.
In PyTorch, the developers have to build their own JIT compiler, their own basic data-structures, and everything else. They need to maintain an entire language embedded inside Python. Flux developers are able to just rely on the Julia devs to work on the language side leaving them ample time to work on the machine learning side of things.
Notice that Flux.jl has a tiny (but very skilled) pool of contributors and started way after PyTorch and TensorFlow both of which are gigantic enterprises with huge amounts of money and programmers, and yet Flux is already some doing things PyTorch and TensorFlow can’t do and is very quickly closing any gaps where it’s lacking.
So I think that even if Flux doesn’t do anything for you that’s all that special right now, I think it may be worth using because it’s attracting the sort of people who are going to make things you will want to use, because those people are noticing that their efforts can be more directed and their time better spent working on Flux.
To be a bit adversarial, there’s 2 reasons that argument isn’t very convincing to me.
Although it’s true that if ML researchers wished to work on the JIT/internal components they’d have major difficulties (they’d need to write a lot of C++), much of what ML researchers wish to contribute are likely to be high level wrappers over the Python abstractions.
Python/C++ are inherently more popular languages than Julia, and one of Julia’s biggest hurdles in general will be convincing Python people to use Julia in the first place.
Regarding the point that Flux.jl was very small/started way after PyTorch/TensorFlow, I suspect that the relatively large size of the PyTorch team (which is still probably an order of magnitude smaller than TF) is merely an artifact of growing bigger. For a long time, the people working on PyTorch was probably <= 3 people as well.
Personal Examples:
Although this paper I’m going to show you isn’t cutting edge amazing machine learning I did do all of the work for it in Julia: https://arxiv.org/pdf/1907.11129.pdf .
If that’s what you’re looking for, then yes Julia doesn’t have major advantages. I mention that in this blog post with some measurements:
Everything is big matmul? Who cares what language you use! However, there is a lot of machine learning which is not standard “ML”. But there’s a whole emerging field of scientific machine learning which is integrating ML with PDE solvers and stuff like that. In that case, you need an AD system which is:
Able to handle existing packages not made ML, like someone’s random climate model.
Able to handle non-functional programming styles. Most autodifferentiation doesn’t handle mutation well for example, but all of the Fortran-style PDE solvers basically have to use it.
Able to handle scalar operations well, since a lot of nonlinear functions have to be described in scalar terms.
PyTorch and TensorFlow don’t handle this. Julia does have a lot of tooling in this domain, and it is still a specialized domain but it’s an interesting one.
As Chris has said, it’s entirely possible that you have a workflow and needs for which Julia has little to offer you that differentiates itself from other languages.
In truth, Julia’s value proposition at this stage of it’s life is mostly centred around making developer’s lives easier and letting them focus on their specialties instead of reinventing a million kldudgy suboptimal wheels.
For people whose primary intent is to apply functions from packages to data in a manner intended by the package writers, the benefits of julia come down to
Julia makes the transition from an end user who strings together package functions and applies them to data to someone who writes / contributes to the internals of non-trivial packages very smooth. Because julia is a very fast, productive language to write, most high performance heavy duty julia packages out there are written in pure julia (not the case for Python!) and that code is often surprisingly readable once you get used to it (same goes for most of julia itself). So people who use julia tend to ‘peek under the hood’ a lot more and end up gaining proficiency in more advanced techniques relatively fast.
I think a lot of people imagine that they’d be happy just using well defined package functions for everything right up until the moment they realize that they just started working on a problem for which no nice package exists anywhere. At that point, the user is going to have to roll up their sleeves. This is the sort of work that julia really excels at. Everything else is going to be rather contingent on what specific sub-field you work in.
Because everyone talks about Flux, I wanted to point out two other ML frameworks in Julia and their advantages. Knet
Watch the beginning of this video for some insight on why Julia has made things easier: It had dynamic automatic differentiation in 2017!
1.) It’s a language that is quite kind to package developers. In my experience it’s much easier to write high performance code in Julia than it is in R for me. For context, I’ve been using R/Rcpp for 10 years and Julia for much less than 1(!). I’m also using Python these days and I certainly don’t think it’s as easy to write high performance code in Python.
2.) It’s a language that, at least in typical ML/Statistics/DS usages, doesn’t really provide any huge advantage to the user over R/Python/etc. It’s not that Julia is any worse than those language, but there’s nothing in the language that I’ve seen so far that really favors the users themselves. If you’re just calling fit(myModel), you may not care if it was very easy for the developer to write the fit method or not; either way, it’s still easy enough for you call fit. While the developer was likely to have the two language problem, the user typically does not.
In light of (1) and (2), there certainly a strong argument that as a current user of ML packages, you may well have more at your fingertips with PyTorch, etc. This is nothing more than a current inertia argument; there are a lot more people working on PyTorch than on Flux at this moment, and the increased Julia productivity is not yet 50x to make up for the smaller developer team.
The hope for Julia is that enough developers find themselves that much more productive writing Julia that eventually the inertia shifts in Julia’s favor. Given my experiences, I think if you had an equal number of Julia developers and Python developers in the world, you’d have way more options in Julia. Circling back to your question “Where does Julia provide the biggest benefit over ML frameworks for research”, undoubtable to me the biggest advantage for ML research is if you are developing novel ML methodology, rather than applying implementations of ML models.
Definitely, on the methodology front, developers will enjoy contributing to Flux over PyTorch. But I think there are plenty of benefits on the user-side.
For example, I was trying to recreate this tutorial in Flux just to get a feel for things. Now this code is readily available, but let’s treat it as an example of something you might write for research. Already, PyTorch has you define a custom ReplayMemory which is just a circular buffer. In Julia, I just loaded DataStructures.jl and had a buffer that I could push! onto and sample from. When people talk about the Julia package ecosystem, this is what they mean.
The actual definition of the DQN is basically the same (though I’ve always thought code like self.head(x.view(x.size(0), -1)) is not intuitively interpretable). So if you limit your comparison to just this section of the code, then it will appear like Julia has nothing to offer.
Moving on, take a look at the optimize_model() code in PyTorch. Only the last five lines of this function are related to updating the weights. Everything before those lines is to compute the Huber loss on the Bellman error defined by the equations above. Here’s what I wrote in Julia:
# helper functions to get the Q and V values for transitions
function Q(transition::Transition)
a = transition.action > 0 ? 2 : 1
q = policy_net(transition.state)[a]
return q
end
function V(transition::Transition)
r = transition.reward
v = transition.done ? 0f0 : maximum(target_net(transition.next_state))
r + γ * v
end
huber(δ) = sum(map(x -> abs(x) <= 1 ? 0.5 * x^2 : abs(x) - 0.5, δ)) / length(δ)
l(q, v) = huber(q .- v)
I think the Julia code reads almost exactly like the mathematical definitions. And just because the Julia code is simple, it doesn’t mean it isn’t fast. In my experience, Flux runs just as fast or faster than PyTorch. A DQN program is very much in the realm of current ML, so any framework should make implementing it easy. Imagine if instead of interfacing with Gym, you had to interface with a climate simulator like Chris suggested. Julia + Flux provides a lot less friction to make that possible.
I think the difference in code size between Flux and PyTorch isn’t for just developers either. In research, when does anyone’s code just work? You are almost always pushing the envelope in terms of what a language or framework can do. Debugging the train! loop in Flux is easy. It’s pretty much exactly what you would expect it to be. And I can easily fix the issue (and submit a PR if I was really motivated). What if you had to debug the internals of PyTorch? As a researcher, I don’t want to waste my time on that.
If I had a dollar for every time a colleague complained about a PyTorch model that wasn’t working from some paper, then I’d buy myself a nice hefty GPU. Following the same logic as (1) and (2), I think Julia + Flux lends itself very nicely to making reproducible results. Because of (1), your code is simple and easy to interpret. Because of (2), if you are contributing to methodology, you could submit a PR or make a package, and it is likely to plug into a downstream user’s code really easily. This really isn’t something that has come about because of a technical reason in Julia or Flux, but due to the unofficial standards followed by the Julia package community. That’s worth counting in my opinion.
Really the only two reasons I see for using PyTorch is that Flux is younger and still filling some gaps and that everyone uses PyTorch. But I think the latter is a self-fulfilling prophecy. You are right that researchers don’t care about the function call overhead of Python, but I do think researchers care about their own time. And the time from idea => implementation is much faster in Julia in my opinion.
As much as I understand what you’re saying, I think a lot of work-flows like this are starting to die out due to easy-to-use soft-ware/services. The ‘button pushers’ will likely shift to some big corporate suite (tableau, H2O, alteryx, etc) while the people solving ‘real’ problems in data science will always need to hack - and hack their way to production reproducibly/quickly.
Data Science seems to be moving into domain specific problem solving. Different problem domains require specialized tools. Either you find a tool or series of compatible tools that has them all($$$, time, and let-down), force a language like python or R to cooperate to fit those domains(time, effort, nearly inevitable use of C/C++/JAVA/scala) then possibly transition to a production language/software (time, effort), or you write your own in a production ready language as quickly as possible using a reliable toolset as a base (why not Julia?).
Julia has major advantages in this paradigm. In the paradigm where “hey this language has a button I can push”, is being beaten to death by 100’s or maybe 1000’s of startups, open source efforts and established companies. I foresee that line of work, will be expedited to analysts, and model factories. Yes they exist and are a growing trend due to SaaS and the likes - think retraining a ResNet/RetinaNet/whatever on your data at the click of a button. It’s when people do smart things outside of typical ‘package’ usage that they form a competitive edge.
Niche/Custom and useful data science/ML, in my eyes, is the way to stay alive long-term. Many will be swallowed by the tide and turn to analysts, data engineers, or soft-ware developers.
I’m not sure those points are sufficiently convincing.
I can tell from experience that I don’t regret doing all my ML research in Julia. In fact when I started doing so in 2015 I first worked on some ML models using python and tried Julia simply out of curiosity. What surprised me was that it (1) took me less time to implement the learning algorithm in Julia than in python and (2) was more than 10 times faster out of the box. Even though I had coded some cython routines and was using numpy and so on. This was years ago and lots has changed to the positive side. If you are not a hardcore deep learning researcher I don’t see much benefit of using python anymore.
To be honest, as a ML researcher like myself this point is not too important. It’s in itself very relevant but people working on ML will often not care too much about the tricks used to make the code run fast. We are more interested in developing machine learning algorithms and models obviously. We don’t care so much about the implementation details of the framework.
By we I mean ML researchers that work on ML theory or algorithms. Not people that develop frameworks. I do both (work on theory and develop a framework) but most ML researchers I know are only users of frameworks.
I don’t understand? Have you used Flux? It’s very transparent and reads almost exactly how the mathematics to perform machine learning work.
It’s really not about “tricks”, it’s more legible to read than pytorch which is far more readable than say tensorflow. Of course those are subjective comments, but to me, flux is about as obvious as it gets. You want to reverse you’re gradient go ahead, you want to do RNN’s - easy, wanna do a bunch of CNN’s fire away, need a really fancy custom loss function - nothing stopping you. Am I missing something?
Is there some series of missing functionality you wish was available? Maybe some high-level wrappers for cook-book neural networks? If that’s the case, a determined individual could probably have most of the essentials written in a few days or something.
By we I mean ML researchers that work on ML theory or algorithms.
If you’re developing new algorithms I don’t see how you can rely only on existing frameworks. What if the framework developer didn’t anticipate the needs of your new algorithm ?
I’m not arguing against Julia or Flux or KNet. I’m using Julia exclusively for my research since late 2015. I’m arguing that the implementation details of a framework, e.g. whether it is implemented in pure Julia or a mix of languages, is not necessary important for every ML researcher.
One last thing. I think it really depends on your area of research in machine learning. Deep learning is only one of many fields and probably the one with the least benefits when moving towards Julia at the moment. But that might change depending on the kind of topics people are focusing on. Other areas might have larger benefits like myself.