What is the right way to compute Gradient of function of 2 variables with Auto Differentiation?

I’m terribly confused with number of packages that provide autodiff functionalities and it’s peculiarity.

I’m required to compute gradient of multivariable function (e.g. f(x,y), where x,y are Numbers).
I found that AutoDiffSource and ReverseDiffSource are not supported.

I succeeded to do it with ReverseDiff.jl package in the next way:

f(a, b) = (1-a).^2 + 100*(a-b.^2).^2;
using ReverseDiff
g = (x,y) -> ReverseDiff.gradient(f, (x,y))

But it looks queer to me.

I have to pass explicit Arrays, and write .* operations in function definition. Why?

I succeeded to compute it with ForwardDiff.jl with derivative but not gradient.

g(a, b) = (1-a)^2 + 100*(a-b^2)^2
dfdx = x -> ForwardDiff.derivative(y -> g(x,y), x)
dfdy = y -> ForwardDiff.derivative(x -> g(x,y), y)

@show dfdx(1)
@show dfdy(2)

Is this two solutions are equivalent?

What is the right way to solve this problem?


What derivatives do you want? Do you mean:

julia> g(a, b) = (1-a)^2 + 100*(a-b^2)^2
g (generic function with 1 method)

julia> ForwardDiff.gradient(z -> g(z[1], z[2]), [1.0, 2.0])
2-element Array{Float64,1}:

Giving you [df/da, df/db] at a = 1.0, b = 2.0

1 Like

I want grad_f(x,y) which can evaluate gradient at arbitrary point (x,y).

julia> grad(x, y) = ForwardDiff.gradient(z -> g(z[1], z[2]), [x, y])
grad (generic function with 1 method)

julia> grad(2,3)
2-element Array{Int64,1}:

Points are typically defined in Julia using Vectors (or for small dimensions StaticVectors). So writing it something like

julia> g(x) = (1-x[1])^2 + 100*(x[1]-x[2]^2)^2 # could of course unpack x into a and b here
g (generic function with 2 methods)

julia> grad(x) = ForwardDiff.gradient(g, x)
grad (generic function with 2 methods)

julia> grad([2,3])
2-element Array{Int64,1}:

julia> using StaticArrays # good for small dim vectors

julia> grad(SVector(2,3))
2-element SArray{Tuple{2},Int64,1,2}:

might be a bit more natural.

Also, note:

julia> using BenchmarkTools

julia> @btime grad($([2,3]));
  4.436 μs (4 allocations: 304 bytes)

julia> @btime grad($(SVector(2,3)));
  2.089 ns (0 allocations: 0 bytes)

which is a 2000x perf difference (which of course is due to how simple g is in this case)


Ok, looks like this is what I’m looking for.

Now, I’m chasing only convenient syntax for me(I will abandon this habit).

We can’t pass tuple to ForwardDiff like this

∇g(x,y) = ForwardDiff.gradient((x,y) -> g(z[1], z[2]), (x,y))

hence we can only use syntax where points are represented as vectors, not tuples.

Yes, for tuples you would convert it to an Array (or for better performance, StaticArray)

julia> ∇g(x,y) = ForwardDiff.gradient(z -> g(z[1], z[2]), SVector(x,y))
∇g (generic function with 1 method)

julia> ∇g(1,2)
2-element SArray{Tuple{2},Int64,1,2}:

Oh, perfect. Thank you a lot.