Where does Julia provide the biggest benefits over other ML frameworks for research?

@trappmartin I agree than, for me, maybe the most important is how easy is to use Julia/Python to develop our own models. Python have many great libraries (scikit-learn, numpy, pandas, PyTorch …). However, I consider Julia better options in several cases because:

  • Python requires Numpy, and you have to use it a lot for performance. In Julia that functionality is integrated (with Distributions, and others small and nice packages). But more important, you do not need use vectorised operations to get good performance.
  • More easy, pandas is great and powerful, but for me DataFrames is a lot more intuitive (and Query is great).

The drawback is that libraries in Julia are not so mature, and sometimes you are missing some functionality. In Deep Learning Julia is not bad positioned (with Flux, for instance), but there is still not alternative to Scikit-learn and all its ML models in Julia (but you can use it from Python).
Also, the tools, and documentation (for me, in Julia the users should help the community improving the documentation of packages, specially with examples).

2 Likes

That’s too simplistic. There’s a lot of mature Julia libraries, and a lot of immature Python libraries. At this point it’s dependent on which one you choose.

3 Likes

There’s a lot of mature Julia libraries

Which julia libraries do you consider mature ?

Because I’ve run into issues using some pretty fundamental ones, such as Plots, for example.

I first started using Julia a couple of years ago and liked many things about it but hadn’t used it in a while for various reasons. Recently I’ve started using it again and again ran into issues that required effort to work around (more, probably, than if I were doing the same things in Python). That’s why I’ve started trying to take a deeper look into the project to see if there’s anything I can do to help it make progress in robustness and usability.

2 Likes

Plots is definitely not mature. It’s a little problematic but it’s one of the more feature-complete things out there. :man_shrugging: We do need to work on plotting.

But what about

  • Optim
  • NLsolve
  • DifferentialEquations
  • CUDAnative
  • CuArrays
  • JuMP
  • NLopt
  • IterativeSolvers
  • BandedMatrices
  • FFTW
  • Cubature
  • Cuba

And the list can keep going. Notice that a lot of these libraries had their last significant change when they updated to Julia v1.0, and other than that a lot of the changes these have had in the list since are not pertaining to their standard features.

5 Likes

Well the last time I tried to use DifferentialEquations, which I think was about a year ago, I couldn’t get Hamiltonian problems to work.

@ChrisRackauckas You have right in saying that Julia as some mature libraries (I love JuMP, for instance) but I was referring me to the ML context.

The big difference from a few years ago are that (1) the language is fully stable and (2) the new package manager with Manifest files is incredibly powerful. If you have something working, you check the manifest into git and can recreate the environment at will. Then you only upgrade that manifest when absolutely needed. Other projects can have different versions of packages in different snapshots that work.

5 Likes

A year ago we were in the middle of our v1.0 transition (August to November or December ish), so I’m not surprised. I’m not sure what that has to do about now though, and even with that there weren’t breaking changes to the most commonly used APIs.

7 Likes

Differentiable programming convinced me to learn Julia. I suspect it’s going to allow much more domain specific knowledge to be exploited. While deep learning was taking off, it was so powerful that “hey, let’s throw this in a vgg/resnet/retinanet/!” worked well, probably better than “how can I optimally design a network for my task?”

The simplicity of the neural ODEs in julia w/ Zygote.jl blew me away with how easily prior ‘domain-specific knowledge’ of ODE’s (e.g. previously implemented ODE solvers) could be exploited out of the box.

11 Likes

Thanks everyone for their points. I’d like to push back on some of these.

@anon92994695 I appreciate the anecdote, but many people would say the same thing about PyTorch, which makes it hard to compare. You provide a couple of examples of what other language can do NeuralODE’s or how easy your language is to use, but from an outside perspective, I don’t have a frame of reference. I’m not sure how hard torchdiffeq was to implement, nor how easy your library is to use compared to alternatives.

@ChrisRackauckas Your blog posts were very helpful (especially the second one). Note, I don’t have much experience with scientific computing.

Essentially, the biggest advantage I see here for Julia is that for a certain class of libraries (linear algebra, differential equations?), one can write general libraries that will then be automatically differentiated. This is an advantage that Python cannot have, as Python libraries typically are built on C++.

The other big advantage is when it comes to scalar code. I was not aware this was an use case at all, but if that’s what you’re doing, I can obviously see where this minimal overhead approach shines.

On the other hand, your response is somewhat orthogonal to my initial question. Scientific computing/ML intersection is somewhat niche, but I can see why Julia would be the superior option I believe that Chainer had dynamic automatic differentiation all the way back in 2015 actually :^) Regarding Tensorflow.jl, most of its points are not very convincing to me as a PyTorch user.

Originally I was going to respond to all the comments, but then I delayed too long and there are too many comments now (also I can’t mention more than 2 people) :'(. So I’ll respond to what I consider the main points:

  1. It’s easier to use Julia to develop new machine learning algorithms than in PyTorch. I’m not convinced this is true. If you’re coming up with a new operation (say… capsule networks) then Julia isn’t going to cut it either - I don’t believe Julia can come up with code that’s within an order of magnitude of what a hand-written implementation can reach. If Julia could, I would be extremely impressed. For other things, I’m not sure what’s easily done in Julia that can’t be easily done in PyTorch. Can you give some examples?

  2. Having the entire stack be written in Julia allows users to easily debug all parts of their stack. This is cool, and I think it’s definitely a plus for Julia. However, most researchers don’t really care about this. It’s fairly rare that a researcher needs to even think of debugging the internals of PyTorch. If this were common, I agree that it’d be a significantly larger plus for Julia.

  3. Having the entire stack be written in Julia allows for arbitrary code transformations that aren’t possible in Python. This would be a killer point for Julia, if this was a big deal. However, I’m not so far convinced that is.

1 Like

What is “hand-written” here? CUDA kernels? “Julia” doesn’t write anything for you, it is you who writes Julia code by hand, so I don’t see how Julia isn’t a suitable tool for rolling your own operations, and yet something else is, since I can use CUDAnative to hand-write extremely performant kernels with the comfort of Julian syntax and semantics.

If researchers are limiting themselves to the basic operations that every other ML framework supports (and probably just proxies to a library like CUDNN), then sure, there’s no strong case for Julia here, because the other frameworks have probably spent lots of effort on giving good error messages for most common errors. But the moment you step outside the comfort zone of pre-canned solutions, and you run into an error somewhere in someone’s logic, what would you prefer: use a single debugger to debug all the way down to the deepest layers of code causing the issue, or having to switch between two different debuggers because the internals suddenly drop to C/C++? This might sound like a trivial inconvenience right now, but I doubt you’d feel the same when you’ve spent days pulling your hair out over a subtle numerical issue or a bug that’s hard to reproduce.

8 Likes

I’ll come at this from a different angle (this post already has plenty of finer points on utility of choice). I simply like and enjoy the Julia language. When I learned Python it felt clunky and non-intuitive; it still does. IME choice of programming language carries a lot more irrational weight than we give credit and ultimately my choice of Julia over Python is more about aesthetics and feel; utility and finer points are rationalized after-the-fact. Don’t get me wrong though, I find reasons for preferring Julia over Python compelling (of course, depending on the context i may prefer Pytorch or Tensorflow over Julia but i consider the former specialized projects).

10 Likes

This depends on your definition of “ML research” and “debug.” I don’t think the former is relegated to just using framework built-ins. Plenty of ML research involves more complicated AD or models. And yes, no researcher has any desire to look at PyTorch’s source code, because it is so complex. But debugging code often means looking at the source code of the underlying frameworks to determine why your code isn’t working. If you use PyTorch, this involves hunting through documentation and StackOverflow posts, because the source code itself is too dense to easily comprehend. With Flux, this is as simple as opening the Github page and going to the line number indicated by the stack trace.

That’s the whole point. We agree that researchers value their time and want to test new ideas instead of code-monkey some library. A simpler codebase in a single language means more time spent on research and less time spent on meaningless programming details.

3 Likes

My personal list of issues with ML in PyTorch looks like this:

  1. In 30% of cases code just works.
  2. In other 20% of cases I forget to call zero_grad(), item(), detach() or to(device). I wish there were a more high-level API like in Keras (I’m looking at skorch right now), but such issues are usually resolved within 10-15 minutes.
  3. 10% of issues are caused by poor choice of optimizer and learning rate. Sometimes it takes half an hour to make my loss to start going down.
  4. 20% are caused by numeric issues like vanishing gradient, division by nearly zero, etc.
  5. In the other 20% of cases it’s just my mistake.

I never had to debug building blocks of PyTorch, only concrete models created using it. And, honestly, PyTorch is much better in this regard since it has a lot more implemented (and sometimes pretrained) models than any framework in Julia.

But training a good ML model is just a part of the workflow. Usually my job includes:

  1. Data analysis. Normally I use SQL + Pandas (Python), but Pandas only helps with a sample since whole dataset doesn’t fit into memory.
  2. Data preparation for ML. Usually I have to use SQL or Spark with Scala (PySpark has poorer support on big clusters like Amazon EMR), but some people point to Python’s Dask.
  3. Other data utilities. For example, today I implemented custom KD tree for data anonymization purposes. I used Python since in Java/Scala it would be much harder to do, but performance is quite poor.
  4. Exposing trained models to production via HTTP. Normally we use Java, Scala, Go or at best Python frameworks over C servers, but never pure Python servers.

We have to use this zoo of technologies since none of them alone supports all our needs and it’s unlikely they ever will. Julia at its current state also doesn’t solve many problems, but at least - and this is the main reason I continue to invest into it - Julia has a chance to become a one-stop solution. And the best thing is that contributing to these tools is much, much easier in Julia than in Scala or C backend of Python libraries.

16 Likes

@dfdx this is another excellent reason why Julia is so promising. 80-90% of data science is spent dealing with the data. Python is quite literally dependency hell, and scala/java just aren’t healthy for machine learning. I remember firing up pyspark with pyarrow and torch to realize there was some low level issue (maybe torch was trying to backprop the compression algorithm - no one knows) that took a day to work around and diagnose the 1k line long error message. Or, someone wrote code in Cython to speed up horrifically slow single threaded feature manipulations and having to support Cython until the end of time and all of its dependency issues.

In Julia, everything is made to work via Julia. Time saved in R&D is great, but time saved in ETL is critical as well. Although Julia still has a bit of work to do, I’ve been able to scale out JuliaDB incredibly easily and with far less troubles then similar technologies. IE: Dask(so many horrifying and unexpected errors) or (Py)Spark (ew).

It seems like the main complaint people have against using Julia is that it requires some level of expertise of the tools being used and the language. IE: if a tool is missing, you have to write it. For some people - this is not a possibility, for others its a super fun opportunity.

16 Likes