I’m terribly confused with number of packages that provide autodiff functionalities and it’s peculiarity.
I’m required to compute gradient of multivariable function (e.g. f(x,y), where x,y are Numbers).
I found that AutoDiffSource and ReverseDiffSource are not supported.
I succeeded to do it with ReverseDiff.jl package in the next way:
f(a, b) = (1-a).^2 + 100*(a-b.^2).^2;
using ReverseDiff
g = (x,y) -> ReverseDiff.gradient(f, (x,y))
g([1],[1])
But it looks queer to me.
I have to pass explicit Arrays, and write .* operations in function definition. Why?
I succeeded to compute it with ForwardDiff.jl with derivative but not gradient.
g(a, b) = (1-a)^2 + 100*(a-b^2)^2
dfdx = x -> ForwardDiff.derivative(y -> g(x,y), x)
dfdy = y -> ForwardDiff.derivative(x -> g(x,y), y)
@show dfdx(1)
@show dfdy(2)
Is this two solutions are equivalent?
What is the right way to solve this problem?
Thx!
What derivatives do you want? Do you mean:
julia> g(a, b) = (1-a)^2 + 100*(a-b^2)^2
g (generic function with 1 method)
julia> ForwardDiff.gradient(z -> g(z[1], z[2]), [1.0, 2.0])
2-element Array{Float64,1}:
-600.0
2400.0
Giving you [df/da, df/db] at a = 1.0, b = 2.0
I want grad_f(x,y) which can evaluate gradient at arbitrary point (x,y).
julia> grad(x, y) = ForwardDiff.gradient(z -> g(z[1], z[2]), [x, y])
grad (generic function with 1 method)
julia> grad(2,3)
2-element Array{Int64,1}:
-1398
8400
Points are typically defined in Julia using Vectors (or for small dimensions StaticVectors). So writing it something like
julia> g(x) = (1-x[1])^2 + 100*(x[1]-x[2]^2)^2 # could of course unpack x into a and b here
g (generic function with 2 methods)
julia> grad(x) = ForwardDiff.gradient(g, x)
grad (generic function with 2 methods)
julia> grad([2,3])
2-element Array{Int64,1}:
-1398
8400
julia> using StaticArrays # good for small dim vectors
julia> grad(SVector(2,3))
2-element SArray{Tuple{2},Int64,1,2}:
-1398
8400
might be a bit more natural.
Also, note:
julia> using BenchmarkTools
julia> @btime grad($([2,3]));
4.436 μs (4 allocations: 304 bytes)
julia> @btime grad($(SVector(2,3)));
2.089 ns (0 allocations: 0 bytes)
which is a 2000x perf difference (which of course is due to how simple g is in this case)
Ok, looks like this is what I’m looking for.
Now, I’m chasing only convenient syntax for me(I will abandon this habit).
We can’t pass tuple to ForwardDiff like this
∇g(x,y) = ForwardDiff.gradient((x,y) -> g(z[1], z[2]), (x,y))
hence we can only use syntax where points are represented as vectors, not tuples.
Yes, for tuples you would convert it to an Array (or for better performance, StaticArray)
julia> ∇g(x,y) = ForwardDiff.gradient(z -> g(z[1], z[2]), SVector(x,y))
∇g (generic function with 1 method)
julia> ∇g(1,2)
2-element SArray{Tuple{2},Int64,1,2}:
-600
2400
Oh, perfect. Thank you a lot.