What lessons could Julia's autodiff ecosystem learn from Stan's TinyGrad?

I would spend 60% on overhead to MIT and then have 2 PhD students. Or a solid team of 4 experienced people without coursework to do over the same time period. 5M isn’t so much from an organizational perspective but is enough to keep a project alive.

It’s no shortage of headaches because Zygote made a major misstep in not committing to mutation support. Under the hood almost everything will use mutation in the Julia standard library. Not committing to support mutation effectively means it will be a non-general AD without wrapping every single thing in the standard library. The idea behind it was that functional programming is beautiful and therefore Julia should be a functional programming language and therefore mutation does not need to be supported, but this is the kind of speculative wishful thinking that cornered it. The starting question of any AD in Julia needs to be “how do you support mutation well?” and then work back from there, since otherwise you will never support a “standard” Julia code. This is why Diffractor never stood a chance, it had this missing from the start, and why Enzyme is doing well.

With Jax that essentially never happens. You need to change all control flow to use lax objects, and you have to use pure non-mutating functions. Jax has had time to be adopted and is also in Python, but in the end is really only picking up mindshare in the nerdiest of circles.

And it shouldn’t be surprising given how people talk about it:

https://www.reddit.com/r/MachineLearning/comments/11myoug/d_jax_vs_pytorch_in_2023/

I’ve seen the same thing over and over and over. “Functional programming is better, so therefore all we have to do is make everyone see the light and then when everyone realizes functional programming is the master race, X will be the best”. You can find stackoverflow threads from 2010 espousing the same concept:

If there’s one story that has played over and over in programming languages, it’s exactly this “future” of functional programming. And time and time again, actual developers have thought “that’s cool” and have stuck to programming on multi-paradigm languages where it’s easier to develop.

People keep targeting functional programming because it’s easier for compilers, but compilers don’t adopt a language, developers do.

I think the moral of the story was that the shift in “late Zygote” to drop any mutation support because it knew it couldn’t do it fast, and instead fallback to saying everyone should do functional programming, was living in a dream world for compilers and not a world for humans. You do need to meet people where they are at, even if performance is sometimes not perfectly optimal. PyTorch has gotten a lot of adoption even though its performance is not always optimal.

Automatic differentiation needs to stress the “automatic” before trying to differentiate itself on performance.

18 Likes