Optim.jl OnceDifferentiable Baby Steps

iwelch · April 5, 2018, 9:46pm

Automatic differentiation seems a little like magic. ?OnceDifferentiable is not very informative. Experimenting with it, I want to see, e.g., what happens if my function uses abs(). As always, I want to start with something simple:

julia> using Optim

julia> myfun(x::Vector{Float64})= (x[1]-2)^2 + (x[2]-3)^2 + (x[1]*x[2])^2;

julia> od = OnceDifferentiable(myfun, [ 0.0, 0.0 ]; autodiff = :forward);

I hope my two questions now are simple:

Why does an automatic differentiator need an initial_x value, at all? Is it to understand the dimensionality of the function (here 2)? Would it be better, then, to have an int argument requesting the length?
How can I use the resulting od object to request the gradient at a particular location? I can use od.f( 2.0, 3.0 ) to use the function, but how do I obtain the two gradients at point [ 2.0, 3.0] ?

Advice appreciated.

PS: Does Optim.DifferentiableFunction still exist?

mohamed82008 · April 5, 2018, 10:44pm

You will need to remove the Float64 type annotation in myfun to use ForwardDiff. Then the following should work.

julia> using Optim

julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + (x[1]*x[2])^2;

julia> od = OnceDifferentiable(myfun, [0.0, 0.0]; autodiff = :forward);

julia> od.f([2.,2.])
17.0

julia> grad = zeros(2);

julia> od.df(grad,[2.,2.])
2-element Array{Float64,1}:
 16.0
 14.0

julia> grad
2-element Array{Float64,1}:
 16.0
 14.0

No.

I will leave the first question to others since I haven’t read auto-diff source codes.

jlperla · April 5, 2018, 10:45pm

Consider playing around with https://github.com/JuliaDiff/ForwardDiff.jl directly to get a feel for how it works first.

Automatic differentiation sure feels magical, but you will be able to more chain-rule than magic. See How ForwardDiff Works · ForwardDiff for a very simple explanation, which hopefully also explains why you need to “seed” at a particular point for any calculation.

iwelch · April 5, 2018, 11:19pm

thanks everyone. I think my guess is right that the initial value is used to determine dimensionality. it makes even less sense in od.df, where the second argument tells it, but then this function is not supposed to be directly used by an enduser anyway.

I am looking at ForwardDiff. what had confused me is that I had thought it would not be able to decompile the function being passed in, but now I realize that it overloads all known elementary julia math functions to basically keep a symbolic copy, so when the function (not a string containing an unevaluated function) is passed into it, it can reconstruct it and apply the chain rule. clever.

I think there is a small math mistake with abs (and kinks) in it:

julia> f(x)= sum(abs.(x))
f (generic function with 1 method)

julia> ForwardDiff.gradient(f, [0.0])
1-element Array{Float64,1}:
 1.0

should probably give NaN or Inf. But this is minor. thanks everyone.

Clarification: Julia has -0.0 and +0.0. In this case, it recognizes -0.0. maybe this is desired behavior??

mohamed82008 · April 5, 2018, 11:48pm

The gradient of the abs function at 0 is not defined. So it is expected that you know the consequences of asking for a derivative at a point where it is not defined. In many optimization problems however where the objective is not smooth it suffices to return back any value in the sub-gradient set which is [-1,1] in the abs function case. This means that many algorithms for smooth optimization will also work for many non-smooth optimization problems.

Elrod · April 5, 2018, 11:50pm

od.df and od.fdf each take a preallocated gradient vector and an input. They update the contents of the gradient vector with the gradient, and od.fdf also returns the value of f.

When creating a oncedifferentiable object, they use both the length and element type of x.

Use fieldnames(typeof(od)) to see all the fields.

anriseth · April 6, 2018, 8:29am

The initial_x input is not related to automatic differentiation. The OnceDifferentiable struct in Optim / NLSolversBase is a used as a helper to access the objective function and gradient (see my answer below). In addition, OnceDifferentiable is a caching device that stores the last x-values that the function and gradients were evaluated at. initial_x is there to “initiate” this caching.

The autodiff=:forward keyword is used in the constructor to tell OnceDifferentiable to evaluate gradients of an objective with ForwardDiff. Alternatively, it will use finite differences. You can also supply the gradient function yourself (there are some examples at NLSolversBase.jl

NLSolversBase provides an API to evaluate OnceDifferentiable-objects. Look here

iwelch · April 6, 2018, 5:32pm

thanks, anriseth. How is this NLSolversBase interface used? I have been trying a simple example, and it’s not obvious to me. Take the function above.

julia> using Optim, NLSolversBase

julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;

julia> od = OnceDifferentiable(myfun, [ 3.0, 3.0 ]; autodiff = :forward);

julia> NLSolversBase.gradient(od, [3.0, 3.0]) .  ## works as expected.
2-element Array{Float64,1}:
 2.0
 0.0

julia> NLSolversBase.gradient(od, [3.0, 3.0]) .  ## wth??  for max confusion?
2-element Array{Float64,1}:
 NaN
 NaN

julia> td = TwiceDifferentiable(myfun, [ 3.0, 3.0 ]; autodiff = :forward);

julia> NLSolversBase.gradient(td, [3.0, 3.0]) .  ## huh?
2-element Array{Float64,1}:
 2.25826e-314
 0.0

julia> NLSolversBase.hessian(td, [3.0,3.0])
ERROR: MethodError: no method matching hessian(::NLSolversBase.TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1})
Closest candidates are:

And what are the admissible keywords to autodiff? I see :forward. Are there others?

PS:

julia> using Optim

julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;

julia> td = TwiceDifferentiable(myfun, [ 3, 3 ]; autodiff = :forward);
ERROR: InexactError()

The vector should probably be promoted to Float.

anriseth · April 6, 2018, 8:20pm

I spoke too soon regarding initial_x. It is used to pass type information etc. to ForwardDiff and DiffEqDiffTools in order to create caches for differentiation.

Regarding NLSolversBase, what you want to do is the following:

x0 = [3.0,3.0]
od = OnceDifferentiable(myfun, x0; autodiff=:forward)

gradient!(od, x0)
gval = gradient(od)

You seem to have discovered a bug, and I’ll report it. You are unlikely to ever need to call gradient(od, x), however, and the approach above is more likely to be what you want.

You are welcome to create an issue or submit a PR

pkofod · April 6, 2018, 9:04pm

Yeah, I admit it was a mistake to remove the autodiff keyword, using OnceDifferentiable directly is probably not meant to be consumed by most users.

I don’t think it should, but sure, we can take that discussion again if you open an issue.

iwelch · April 6, 2018, 10:11pm

Gentlemen—can I beg you to report if some of my code stumbled over bugs? See, I am not certain what my code output should be. The docs are not ready yet, so I am trying to learn Optim.jl by experimentation. I am just wondering whenever I find an inconsistency and post when I get stuck.

I am trying to write a “starter” chapter about Optim at http://julia.cookbook.tips/doku.php?id=minimasn , so I hope that if I get clearer over time, you will get fewer dumb questions from me and others.

The following may or may not be the same or a different bug. here it bites on ordinary optimze use.

myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;
myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ),  2*(1 + (x[1]-2)^2)*(x[2]-3) ]
myfunhessian(x)= [ [ 2 + 2*(x[2]-3)^2, 4*(x[1]-2)*(x[2]-3) ], [ 4*(x[1]-2)*(x[2]-3) , 2 + 2*(x[2]-2)^2 ] ]
 
function f(x) myfun(x) end#function
function g!(G,x) G=myfungradients(x) end#function
function fg!(G,x) g!(G,x); f(x) end#function   (both! can be used to shorten calculations)
function h!(H,x) H=myfunhessian(x) end#function
 
using Optim
z2= zeros(2.0)
nd = NonDifferentiable( f, z2 )
od = OnceDifferentiable( f, g!, z2 )
td = TwiceDifferentiable( f, g!, h!, z2 )
 
println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN,NaN], [2.0, 2.0] ) ) )
 
println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) )
 
optimize( td, z2, BFGS())  ## works
 
optimize( od, z2, BFGS())  ## error

I will post another question in another thread.

mohamed82008 · April 7, 2018, 2:51am

This is not a recommended style since it allocates memory for the output.

The idea of g! is that it should modify the object that G refers to in-place. What you are doing here is simply rebinding the local G name to the array output by myfungradients(x), but outside the function, the passed object was not modified. See this for more details Values vs. Bindings: The Map is Not the Territory · John Myles White. When you modify the object that G binds to in-place, you are not allocating additional memory so that’s efficient.

I am actually surprised any of them works at all!

See these 2 examples:

iwelch · April 7, 2018, 4:23am

the worst part is when bad stuff sometimes or almost work. if it always bombs badly, its great.

I understood now. yes, I knew the difference, but it had not dawned on me that this was essential in this context, too. this is all making sense to me, now, too. not just madness. madness with method. I now understand why the API for g and h is as it is.

alas, logically, this is somewhat painful. I won’t be the last person to screw this up. is there some clever way for Julia to construct it, rather than pushing the burden onto the unsuspecting end user?

then again, this is probably not worth too much bother. the user may as well use automatic differentiation. symbolic provision may be on the way out.

thanks a lot mohamed.

anriseth · April 7, 2018, 6:54am

EDIT: I have made hundreds of edits here, and only just realised why your example fails. Your function g!(G,x) does not add the values of the gradient to G, but rather, it redefines G to another vector. This is a subtetly of how Julia works, rather than Optim.

G = zeros(2)
g!(G, zeros(2)) # 
G != myfungradients(zeros(2)) # You have not actually updated `G` in the outer scope.

OLD:
That is a bit weird. It is helpful if you provide us with the error message as well as the versions of Optim et al that you are using.

Note that the following works

using Optim

f(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2
function g!(out,x)
    out[1] = 2*(1 + (x[2]-3)^2)*(x[1]-2)
    out[2] = 2*(1 + (x[1]-2)^2)*(x[2]-3)
end
function h!(out, x)
    out[1,1] =  2 + 2*(x[2]-3)^2
    out[1,2] = 4*(x[1]-2)*(x[2]-3)
    out[2,2] =  2 + 2*(x[2]-2)^2
    out[2,1] = out[1,2]
end

z2= zeros(2.0)
od = OnceDifferentiable( f, g!, z2 )
td = TwiceDifferentiable( f, g!, h!, z2 )


optimize(f, g!, z2, BFGS())
optimize(td, z2, BFGS())  
optimize(od, z2, BFGS())

iwelch · April 7, 2018, 5:28pm

sorry. indeed, mohamed pointed it out. for others, here is a correct example:

myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;
myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ),  2*(1 + (x[1]-2)^2)*(x[2]-3) ]
myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2, 4*(x[1]-2)*(x[2]-3), 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 )

function f(x) myfun(x) end#function
function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end#function  ## must not use G=Gin!!!
function fg!(G,x) g!(G,x); f(x) end#function   (both! can be used to shorten calculations)
function h!(H,x) Hcp=myfunhessian(x); for i=1:length(x),j=1:length(x) ; H[i,j]= Hcp[i,j]; end; end#function

using Optim
z2= zeros(2.0)
nd = NonDifferentiable( f, z2 )
od = OnceDifferentiable( f, g!, z2 )
td = TwiceDifferentiable( f, g!, h!, z2 )

println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) )
println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) )

optimize( td, z2, BFGS())
optimize( od, z2, BFGS())

this can be speeded up by avoiding the Gcp and Hcp, but at least it is correct.

pkofod · April 7, 2018, 8:48pm

If you’re interested, you can see my latest PR here https://github.com/JuliaNLSolvers/Optim.jl/pull/577

It basically adds a keyword that will remove two of the annoyances you’ve seen. The first is an autodiff at the optimize level so you don’t have to worry about OnceDifferentiable and so on, and the other one is the inplace keyword at the optimize level as well, that allows you to pass your f(x), g(x) and h(x) style functions by setting inplace = false. Notice, that for users to use this the PR first has to be merged (I still need more tests and to change the documentation), then we have to tag a new version and then it has to be merged in METADATA.jl, so it might be a little while before it goes “live”.

iwelch · April 7, 2018, 11:56pm

thanks. this sounds great. regards, /iaw

anriseth · April 8, 2018, 10:15am

Note that, in Julia 0.6, you can assign elements of an array with a “dot”: Gcp .= myfungradients(x).
https://docs.julialang.org/en/stable/manual/functions/#man-vectorized-1

Tamas_Papp · April 8, 2018, 10:33am

In DiffWrappers.jl, I provide the following interface (used by DynamicHMC.jl):

the user codes a \mathbb{R}^n \to \mathbb{R} function f, ie takes an <:AbstractVector and returns a <:Real,
Then
```
ForwardGradientWrapper(f, n)
```
can be used to generate a callable that returns values and derivatives using a DiffResult object. The dimension can also be queried by length.

I think it would be great to share a common interface (and implementation, using forward- and later reverse-mode AD) for this pattern that is independent of packages.

iwelch · April 8, 2018, 5:22pm

that’s great. thanks for pointing this out.

Topic		Replies	Views
Optim user-supplied gradient not working Optimization (Mathematical) optim	6	1055	May 21, 2019
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2723	November 28, 2020
How to autodifferentiate the results of NLsolve? General Usage autodiff	24	2487	March 8, 2023
Error when enabeling automatic differentiation in Optim Optimization (Mathematical) optim	4	824	August 16, 2017
Mismatched ForwardDiff and FiniteDifference gradients with ImplicitDifferentiation Specific Domains forwarddiff , implicit-equation	12	136	September 25, 2024

Optim.jl OnceDifferentiable Baby Steps

Related topics