ReverseDiff: differentiating a function with an indexing argument

I am looking to move from using ForwardDiff to ReverseDiff due to the speedup provided by compiled tapes. I would like to differentiate a function where one of the arguments is an index of the output, and then compile this jacobian calculation to be reused for the rest of the indexes, however this throws an error.

How would one differentiate the following function using ReverseDiff?

f(x::Vector, theta::Vector, i) = (x.*theta)[i]

Currently I have

using ReverseDiff: jacobian
jacobian(f, ([1.0 2.0], [3.0 4.0], [2.0]))
# Indexing with an integer gives the same error

which gives the error:

ArgumentError: invalid index: TrackedReal<59S>(2.0, 0.0, 2eM, 1, HO3) of type ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}
1 Like

May the example is too simplified, but isn’t this simply f(x,theta,i) = x[i]*theta[i], but much slower?

(the OP question remains, as ReverseDiff does not compute the gradient of that either). I guess i is not an actual variable, so that is confusing). You could do:

julia> f(x,y,i) = x[i]*y[i]
f (generic function with 1 method)

julia> i = 2

julia> ReverseDiff.gradient( (x,y) -> f(x,y,i), ([1,2],[3,4]))
([0, 4], [0, 2])

3 Likes

Taking a step back, what should the answer be? What is the partial derivative of an indexing operation like y[i] w.r.t. the index i? Since i must be integer-valued, a partial derivative w.r.t. it does not make a great deal of sense, right?

You also can’t do this with ForwardDiff either, for exactly the same fundamental reasons:

julia> x = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> ForwardDiff.derivative(i -> x[i], 2.0)
ERROR: ArgumentError: invalid index: Dual{ForwardDiff.Tag{var"#5#6",Float64}}(2.0,1.0) of type ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6",Float64},Float64,1}
Stacktrace:
 [1] to_index(::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6",Float64},Float64,1}) at ./indices.jl:297

(note the same “invalid index” error)

The solution is to not try to take a derivative w.r.t. a non-differentiable argument. You do that by defining a closure which captures a particular value of the index, and then take derivatives of that function w.r.t. just the real-valued arguments:

julia> f(x::AbstractVector, theta::AbstractVector, i) = (x.*theta)[i]
f (generic function with 3 methods)

julia> g(x, theta) = f(x, theta, 2)
g (generic function with 1 method)

You also need gradient, not jacobian since your function returns a scalar not a vector:

julia> using ReverseDiff: gradient

julia> gradient(g, ([1.0, 2.0], [3.0, 4.0]))
([0.0, 4.0], [0.0, 2.0])

There’s also no need to define g explicitly; you can use an anonymous function instead:

julia> gradient((x, theta) -> f(x, theta, 2), ([1.0, 2.0], [3.0, 4.0]))
([0.0, 4.0], [0.0, 2.0])
6 Likes

Thank you for your in-depth reply. I agree with everything that you have said and this brings me to the main point of discussion. Your solution of defining a closure function and differentiating it, changing the value of the iterator each time, is what my current code uses via a changing anonymous function. However, ForwardDiff has to recalculate the gradient at every step, for example:

using ForwardDiff: gradient
f(x::AbstractVector, theta::AbstractVector, i) = (x.*theta)[i]
data = rand(3,1)
parameters = rand(3,1)
results = zeros(size(x))
for k = 1:5
    results[k] = gradient((x, theta) -> f(x, theta, k), (data, parameters))
end

Obviously this is a silly example, as one could simply redefine the function and construct a jacobian with one call to the respective function, however the intent remains the same. If I absolutely had to write a function this way and I would like to be able to speed up this process how would you approach it?

Why ReverseDiff’s compilation springs to mind is that the underlying structure of the gradient tape will remain the same for each call, and simply the value of i will change. Intuitively this means that we can reuse the same tape for multiple calls to a set of closure functions each only incrementing the value of i.

Maybe one approach could be to wrap the index in a Ref which is closed over by the closure. The Ref can then be mutated outside the closure and the closure kept the same?

1 Like

Thank you for the idea, I think this is indeed what I am trying to achieve. Please correct me if I’m wrong, but what I understand is that you are saying that we can define i as a pointer to memory, and then altering i will somehow affect the compiled tape of the closure function?

I have tried to naively implement this as such:

using ReverseDiff: gradient!, GradientTape, compile
f(x, theta, i) = (x.*theta)[i[]] # note i is now i[]
g(theta) = f(x, theta, i)

x = rand(4,)
theta = rand(4,)
i = Ref(1)
g_results = zeros(size(theta))

g_tape = GradientTape(g, rand(1:0.1:4, size(theta)))
compiled_g_tape = compile(g_tape)

grad1 = gradient!(g_results, compiled_g_tape, theta)

i = Ref(2)

grad2 = gradient!(g_results, compiled_g_tape, theta)

However, unfortunately grad1 and grad2 come out to be the same rather than each having all zeros in all but the ith location.

4-element Array{Float64,1}:
 0.11097540880585721
 0.0
 0.0
 0.0

Could you provide a small example demonstrating your suggestion? My apologies if this is a simple concept but I am not experienced using Ref or pointer arithmetic.