Just learned about The simple essence of automatic differentiation.
Did somebody already try to implement these ideas in Julia?
Any thoughts about pros and cons of this approach relative to e.g. Zygote.jl?
I haven’t looked in great detail but from what I remember, I don’t think there’s really “an approach” here, so much as a translation of known ideas in AD (e.g. reversal of a Wengert list) to the language of category theory (e.g. taking the dual of a category). Maybe for some people that makes it seem simpler, but I’m not in that boat myself; I’d be interested in seeing something more implementation-focused though.
I got the impression that tapes (Wengert lists) are not needed in Elliot’s approach; but I am not an expert.
You always need to store intermediate values somewhere. There’s lots of fun ways to hide them (e.g. in closure structures which get chained), but you fundamentally can’t get around that (well, you can re-compute, but at that point you’re not really doing reverse mode anymore). Something else people sometimes mean when they say tape-free is that the operations themselves don’t get recorded on the tape, because the operation is implicit (in the code generated by the AD framework), but that’s not really specific to the formulation. FWIW, I do like the push for more formalism in AD, particularly as it relates to connecting to normal notions in differential geometry. However, I also think that it generally has little impact on implementation.
I have been wondering about whether doing this in Julia would be more inference-friendly. But perhaps the chains get too long after a while.
To some extent yes, which is why Zygote gets better type information (e.g. all the TPU work needs that).