Thoughts on Jax vs CuArrays and Zygote

xiaodai · January 22, 2019, 6:55am

Google’s JAX looks like a combination of CuArrays and Zygote for Python. I wonder if anyone can compare and contrast Jax with CuArrays and Zygote?

I can’t gather how Jax is different apart from the fact that it compiles to XLA.

Janis_Klaise · March 14, 2019, 11:18am

I’m also interested in a comparison. My understanding is that Jax supports AD on a subset of Python+Numpy. I suspect it would be a lot of effort for Jax to keep and expand support, but I’m not sure if Julia and the various AD packages (Zygote in particular) are competitive yet, e.g. on par with functionality in the Jax cookbook: Google Colab

cscherrer · April 8, 2019, 4:18am

Just saw this and I’m curious as well (I don’t know much about Jax). @MikeInnes could you share your thoughts?

MikeInnes · April 8, 2019, 9:59am

Jax is really nice; at this point TensorFlow and PyTorch have converged to being hybrids of each other, with a bunch of different ways of doing the same thing (“eager mode”, “static graphs”). Jax really refreshes and cleans up this design, with well-thought-out semantics and interfaces, along with having a lot of nice ideas around composable code transforms that are very much on our wavelength (c.f. Cassette).

Python still has the fundamental limitations we discussed a while back, though. The major development since then is that explicit graph building has been replaced with eager semantics + tracing; this is much more intuitive but technically largely equivalent. So in TF 2.0, JAX, or PyTorch JIT, if you want control flow and performance, you’ll still need to replace your loops with framework-compatible versions, and this has limitations around recursion, mutation, custom data structures etc., and in particular it can’t differentiate through any other python library code.

The paper on TF 2.0, which shares many ideas with Jax, discusses this a bit as well:

In TensorFlow Eager, users must manually stage computations, which might require refactoring code. An ideal framework for differentiable programming would automatically stage computations, without programmer intervention. One way to accomplish this is to embed the framework in a compiled procedural language and implement graph extraction and automatic differentiation as compiler rewrites; this is what, e.g., DLVM, Swift for TensorFlow, and Zygote do. Python’s flexibility makes it difficult for DSLs embedded in it to use such an approach.

cgarciae · April 25, 2020, 1:46am

I think JAX is a bit more than “just” cuda + autodiff, the XLA compiler also produces highly optimized CPU code (5x faster than numpy on a real usecase I had) with the added bonus bonus that the exact same code also runs on GPU is available.

Maybe Julia can compile to XLA to gain speed on all hardware supported by XLA? I’ve seen XLA.jl but it seemed too focus on TPUs.

xiaodai · April 25, 2020, 2:07am

Logical conclusion. Give up on Python. Coalesce around Julia and make more non-data-science purpose libraries in Julia. That day will come.

jpsamaroo · April 27, 2020, 11:29pm

I think the general direction we’re heading is to use MLIR (an LLVM project) as Julia’s backend compiler and Intermediate Representation. MLIR has “dialects” like Affine which allow representing operations on tensors natively within the IR, as well as optimization passes which can operate on “tensor IR”. So it should be able to get us to a similar level of performance as XLA, while still providing full access to LLVM via the LLVM dialect.

Ratingulate · April 28, 2020, 12:23am

Are you saying the plan is to replace Julia’s IR with an MLIR dialect, or that it will be one of the IRs in the stack?

ToucheSir · April 28, 2020, 12:36am

This is amazing to hear. Are you referring to the Tensor Compute Primitives Proposal as the “tensor IR” in question?

Also, have any of the JuliaGPU contributors looked into MLIR-based tensor/linalg runtimes such as IREE? I know XLATools.jl exists, but being able to wrap a native library instead of jaxlib seems like a plus

jpsamaroo · April 28, 2020, 2:56pm

It would become another IR in the stack, most likely. However, to make full use of it, we’ll need some way to “lower” Julia’s array operations into the statements that match the semantics we need, so Julia’s own IR(s) could potentially be influenced by this work to make said lowering easier to implement.

jpsamaroo · April 28, 2020, 3:00pm

You’re probably right. I’m not the MLIR expert or developer by any means, I’m just communicating what I know about some of the work that’s being done, so everything I say about this should be taken with a grain of salt

I can’t speak for the JuliaGPU contributors, but I haven’t heard anything about anyone targeting IREE. I suspect that’s because we don’t yet have MLIR support in Julia, and that’s probably a blocker to targeting IREE. I suspect in a few months it’ll be worth putting that option on the table.

Topic		Replies	Views
What happened to XLA.jl Machine Learning	16	4737	March 12, 2023
What are the future plans for Scientific ML? Machine Learning	1	444	November 4, 2024
Thoughts on JAX vs Julia Community jax	21	11127	October 25, 2022
Automatic Differentiation (AD) in Python compared to Julia and AD Basics Machine Learning	6	2239	October 22, 2021
Automatic Differentiation (AD) in Julia vs. Python (or PyTorch) Machine Learning autodiff	14	1586	January 16, 2025

Thoughts on Jax vs CuArrays and Zygote

Related topics