I’m terribly confused with number of packages that provide autodiff functionalities and it’s peculiarity.
I’m required to compute gradient of multivariable function (e.g. f(x,y)
, where x,y
are Numbers).
I found that AutoDiffSource and ReverseDiffSource are not supported.
I succeeded to do it with ReverseDiff.jl package in the next way:
f(a, b) = (1-a).^2 + 100*(a-b.^2).^2;
using ReverseDiff
g = (x,y) -> ReverseDiff.gradient(f, (x,y))
g([1],[1])
But it looks queer to me.
I have to pass explicit Arrays, and write .*
operations in function definition. Why?
I succeeded to compute it with ForwardDiff.jl with derivative but not gradient.
g(a, b) = (1-a)^2 + 100*(a-b^2)^2
dfdx = x -> ForwardDiff.derivative(y -> g(x,y), x)
dfdy = y -> ForwardDiff.derivative(x -> g(x,y), y)
@show dfdx(1)
@show dfdy(2)
Is this two solutions are equivalent?
What is the right way to solve this problem?
Thx!
What derivatives do you want? Do you mean:
julia> g(a, b) = (1-a)^2 + 100*(a-b^2)^2
g (generic function with 1 method)
julia> ForwardDiff.gradient(z -> g(z[1], z[2]), [1.0, 2.0])
2-element Array{Float64,1}:
-600.0
2400.0
Giving you [df/da, df/db]
at a = 1.0, b = 2.0
1 Like
I want grad_f(x,y)
which can evaluate gradient at arbitrary point (x,y)
.
julia> grad(x, y) = ForwardDiff.gradient(z -> g(z[1], z[2]), [x, y])
grad (generic function with 1 method)
julia> grad(2,3)
2-element Array{Int64,1}:
-1398
8400
Points are typically defined in Julia using Vectors (or for small dimensions StaticVectors). So writing it something like
julia> g(x) = (1-x[1])^2 + 100*(x[1]-x[2]^2)^2 # could of course unpack x into a and b here
g (generic function with 2 methods)
julia> grad(x) = ForwardDiff.gradient(g, x)
grad (generic function with 2 methods)
julia> grad([2,3])
2-element Array{Int64,1}:
-1398
8400
julia> using StaticArrays # good for small dim vectors
julia> grad(SVector(2,3))
2-element SArray{Tuple{2},Int64,1,2}:
-1398
8400
might be a bit more natural.
Also, note:
julia> using BenchmarkTools
julia> @btime grad($([2,3]));
4.436 μs (4 allocations: 304 bytes)
julia> @btime grad($(SVector(2,3)));
2.089 ns (0 allocations: 0 bytes)
which is a 2000x perf difference (which of course is due to how simple g is in this case)
7 Likes
Ok, looks like this is what I’m looking for.
Now, I’m chasing only convenient syntax for me(I will abandon this habit).
We can’t pass tuple to ForwardDiff like this
∇g(x,y) = ForwardDiff.gradient((x,y) -> g(z[1], z[2]), (x,y))
hence we can only use syntax where points are represented as vectors, not tuples.
Yes, for tuples you would convert it to an Array (or for better performance, StaticArray)
julia> ∇g(x,y) = ForwardDiff.gradient(z -> g(z[1], z[2]), SVector(x,y))
∇g (generic function with 1 method)
julia> ∇g(1,2)
2-element SArray{Tuple{2},Int64,1,2}:
-600
2400
2 Likes
Oh, perfect. Thank you a lot.