I have a question concerning ReverseDiff. I am doing some simple benchmarking with different Julia tools for automatic differentiation. In theory (at least what I know of automatic differentiation theory) computing a gradient should take no more than five times the time it takes to compute the objective function (with the reverse mode).
So I took the extension of the Rosenbrock function:
function f(x)
n=1000;
return 100.0 * sum((x[i] - x[i - 1]^2)^2 for i=2:n) + (1.0 - x[1])^2
end
For all the values of n I tested, the time to compute the gradient is much more then 5 times the time it took to compute f(x). The testing I did was simple:
x = rand(n) # Where n is the right size for f
t = @elapsed f(x)
t_g = @elapsed ReverseDiff.gradient(f, x)
With n=100, t=2.224000e-06 and t_g = 9.482440e-04
With n = 500, t = 2.054000e-06 and t_g = 4.637314e-03
With n = 1000, t =3.007000e-06 and t_g = 9.125266e-03
Am I misunderstanding the theory? Or is there something in my code that’s wrong? I am really confused by the disparity between theory and real life…
Thanks for your help!
(I am new to discourse so I don’t know if it’s the right place to ask this question…)