What lessons could Julia's autodiff ecosystem learn from Stan's TinyGrad?

ParadaCarleton · September 12, 2023, 2:33am

Partly, but most of these operations really are just uncommon–there’s a reason most people can use ReverseDiff.jl just fine without any issues.

mariusd · September 13, 2023, 12:45pm

I do very much agree with this.
What currently happens is that most AD effort is put into Diffractor.jl and into maintaining Zygote. These two packages are super-hard to implement and maintain. Just look into the Diffractor’s implementation and you will see why. Same for Enzyme.jl. Few people can deal with them and developing only these packages is not a good sign at all.
What I wish would happen from now on is to focus/promote the simpler packages Tracker.jl or Autograd.jl, which are a lot easier to implement and understand (cost effective) and you can get the community involved. How to promote them? Well, having them working with Flux again. And I think one of them to be the default for Flux. As @ToucheSir mentioned, this will serve nearly all needs in “classical” CNNs.

bob-carpenter · September 24, 2025, 7:33pm

@dlakelan: Don’t know how I found this, but I wanted to set the record straight. We built the entire first released version of Stan, CmdStan, and RStan in 1.5 years on a $400K DoE grant plus some pocket change floating around the stats department. We were all paid under $100K/year or the budget wouldn’t have worked. Almost all of the C++ programming for the first release was done by me and Daniel Lee, with Matt Hoffman helping out with the dynamic memory design for the autodiff and coding up NUTS. At the same time, Jiqiang Guo, who was a postdoc of Andrew Gelman’s, wrote most of the RStan interface for the first release. Ben Goodrich was around and helping out. I can’t recall if Allen Riddell had started work on PyStan at that point. By the time I left Columbia 5 years ago, Stan had received around $9M in funding. But of course, almost none of that funding was exclusively for Stan because research grant agencies don’t like to fund software.

Having said that, the result wasn’t nearly the extensive autodiff library we have now. I made a few key design mistakes, like not keeping contiguous memory for arrays so that I could reshape (they are nested std::vector in C++). And using an array of structs rather than struct of arrays pattern for matrices. The SoA vs. AoS decision is being fixed now. The second big mistake was trying to have RStan and PyStan communicate to Stan over high-level interfaces like Rcpp and Cython. We did things right eventually with BridgeStan. I probably should have also prohibited branching on parameters—you can’t do that in JAX or other static autodiff systems. I also should’ve adopted the adjoint-vector product formulation of autodiff more explicitly—it was implicit in the closures/continuations we built for the backward pass, but I really like the way Zygote did that in Julia.

P.S. In case people didn’t know, that $400K grant was a sub-award to Columbia from the DoE grant to MIT that led to Julia (and if you don’t know anything about academic funding, a $400K grant to Columbia translates to about $200K of salary). We applied for follow-on funding and the DoE laughed (more like shouted) us out of the room for building toys rather than the scalable tools they needed—we didn’t even make it past the pre-proposal stages despite having released the first public versions of both Stan and Julia under that one grant!

P.P.S. To the best of my recollection, here’s the story of how Stan grew (your site won’t let me include links): statmodeling.stat.columbia.edu/2022/10/12/0-to-100k-users-in-10-years-how-stan-got-to-where-it-is/ (plus a free bonus link to a clip mentioning Stan and Julia on the TV show Billions!)

tecosaur · September 25, 2025, 4:51am

Oh that’s very fun! Thanks for chiming in.

DoktorMike · September 25, 2025, 8:40pm

Great to see you in here Bob. Your contribution to probabilistic programming has enabled a lot of scientists to easily build solid bayesian models. I never knew the story behind Stan so it’s really cool to read about it. I’ll check out the link you provided.

dlakelan · September 26, 2025, 1:16am

Hey Bob, thanks so much for getting the details in there. Have nothing but the utmost respect for what the whole team accomplished. And yes its pretty cool that Julia and Stan both came out of the same seeds so to speak.

Topic		Replies	Views
[blog post] Implement your own AD with Julia in ONE day Community blog-post	33	4378	November 3, 2018
State of automatic differentiation in Julia Machine Learning	57	22025	September 8, 2021
Comparison of automatic differentiation tools from 2016 still accurate? Numerics differentiation	41	5941	August 16, 2018
State of machine learning in Julia Machine Learning	60	66175	August 26, 2022
Thoughts on JAX vs Julia Community jax	21	11495	October 25, 2022

What lessons could Julia's autodiff ecosystem learn from Stan's TinyGrad?

Related topics