Machine learning in Julia decouples the Tensor and the autograd library, what was the practical benefit?

A typical machine learning framework is like this: First, you have a Tensor library that expresses computing and does matrix multiplication, among other things, and then you add automatic differentiation to the system and perhaps accelerated linear algebra.

Julia’s approach is more theoretically elegant: You have a Tensor/utility library and an autograd library, and then you can package them together. You can theoretically combine any Tensor library with any autograd library (and perhaps with any compiler extension that speeds up linear algebra and so on) and it’s supposed to work. This is very elegant in theory.

However, in practice, you encounter mutation nightmares, gradient correctness problems, difficulty in implementation, and a bunch of other issues.

Julia decouples the Tensor/utility function, the jit/xla part, and the autograd, which is indeed very elegant, but is there any major practical benefit to this approach that makes it worth the added difficulty? What was it? Or was it this way because the Julia ML community took the Haskell philosophy of doing difficult things for elegance alone?

1 Like

Your analysis is correct about the differences between Julia and other ML ecosystems like Python.
To me, the main practical benefit is that you don’t have to duplicate libraries in every tensor framework. You don’t need a diffrax and a torchdiffeq to suit both PyTorch and JAX: you can have a single DifferentialEquations.jl in which you pour all of the community’s efforts, and then make minor adjustments to ensure compatibility with various autodiff backends.

In this regard, projects like DifferentiationInterface.jl are (in my biased opinion) a key requirement to reach the right level of abstraction and separate concerns between tensors and gradients.

We discuss this a little in our JuliaCon 2024 tutorial on autodiff:

3 Likes

How you’re describing Julia packages sounds like the concept of composability, and you’re exaggerating its applicability and peculiarity. Composability isn’t easy, each component must accurately describe its API and other components must adhere to the API to work with it; if that’s not possible, then they are incompatible.

Any large project will compose independent components, that’s not at all unique to Julia. The question is whether those components are exposed, and there are good reasons for not doing that; for example, Python users may not benefit from a Python library exposing its C libraries. Developers might, but Alphabet and Meta have other incentives to not rely on each other’s tools too much.

This has little to do with the practical issues you mentioned, those are more down to design and development resources (Big Five money and infrastructure goes a long way). Julia developers could have pooled everything into one package from the start or now, and it wouldn’t solve anything.

I think most answers will focus on the technical details of the different result. While, in my opinion, the question and the answer are basically management question.

When designing a complex system one can balance 2 things:

  1. Design and the develop the system in a total coherence. Everything is synchronized at each step. All efforts are focused and managed top down.
  2. Design and develop the system in a distributed manner. Design is based on a rough API, each effort works on it own.

In reality, development is a combination of the 2. More gray than black and white. This is the art of management and System Engineering.
As a guideline, one tends to (1) when time and the resources are limited.
One tends to (2) when the robustness of the solution is more important.

This reflects many areas of development in the Julia Eco System.
While most will say resources are limited, and how can we compete with JaX yet still the way the community works is mostly designing projects and coordinating efforts mostly by (2).
There are good reasons, mostly related to that most packages and development efforts are not production oriented but research oriented.
Yet still, if one would like to maximize the yield given an effort, balancing more towards (1) is the way to go.

1 Like

That’s a good point! Saving a good deal of duplicated effort is how we win anyway! I feel like Julia is a dark horse in the machine learning world right now.