What lessons could Julia's autodiff ecosystem learn from Stan's TinyGrad?

To me every Python AD feels like a walled garden as each comes with its own incompatible tensor types and slightly different (sub)set of supported operations. Some of these gardens are just very large by now, e.g., PyTorch is aiming to reduce its set of operators from 2000+ to around 250 core ones.
In Julia, I can just write my model – possibly combining several libraries – and then try different ADs on it. Especially AbstractDifferentiation makes it very easy to change the AD backend. In my experience, ForwardDiff and ReverseDiff have worked quite reliable (even in some crazy use cases that I would not have tried in Python to begin with). I did face several issues with Zygote though, either hitting some limitations or silently wrong gradients, so I tend to avoid it for now.

1 Like