I typically differentiate log-likelihood functions of the forms x -> f(x, data) wrt x, where x is a multi-dimensional input, data is some fixed dataframe, f is a scalar-valued function. To do so, I typically use either ForwardDiff or ReverseDiff. I was recently looking back at the ReverseDiff documentation and there was a recommendation to use a ReverseDiff.GradientTape to prerecord f. I was wondering if there is way to pre-complie “tapes” for functions of the form x -> f(x, data). The relevant links which made me ask this are
While searching for a solution, I have also read about talks of Capstan.jl but I am unable to appreciate how this new package (and, there seems to be a lot of excitement about this!) will improve the existing implementations of ForwardDiff.jl and ReverseDiff.jl.
I found tapes to be very fragile in practice (a lot of seemingly innocuous Julia code has branches, which will break things).
I have created a simple interface package
which allows you to define a \mathbb{R}^n \to \mathbb{R} callable (so put the data in eg a struct), then AD it via either ForwardDiff, ReverseDiff, Flux, or Zygote (experimental).
and I have found this to be the most robust in my experiments, even though the documentation suggests that one should use ReverseDiff for problems where f:\mathbb{R}^n \rightarrow \mathbb{R}, where n > 1 (but also mentions that ForwardDiff can be faster for low dimensional inputs).