Thank you for your in-depth reply. I agree with everything that you have said and this brings me to the main point of discussion. Your solution of defining a closure function and differentiating it, changing the value of the iterator each time, is what my current code uses via a changing anonymous function. However, ForwardDiff has to recalculate the gradient at every step, for example:
using ForwardDiff: gradient
f(x::AbstractVector, theta::AbstractVector, i) = (x.*theta)[i]
data = rand(3,1)
parameters = rand(3,1)
results = zeros(size(x))
for k = 1:5
results[k] = gradient((x, theta) -> f(x, theta, k), (data, parameters))
end
Obviously this is a silly example, as one could simply redefine the function and construct a jacobian with one call to the respective function, however the intent remains the same. If I absolutely had to write a function this way and I would like to be able to speed up this process how would you approach it?
Why ReverseDiff’s compilation springs to mind is that the underlying structure of the gradient tape will remain the same for each call, and simply the value of i will change. Intuitively this means that we can reuse the same tape for multiple calls to a set of closure functions each only incrementing the value of i.