Conjugate Gradient optimization with set of functions

sergevic · June 25, 2020, 9:42pm

This post is related to a previous post of mine and I prefered to expose the problem separatey. I have to optimize a set of functions with conjugate gradient optimization, but I have to define the whole set of functions first. The problem is that I have a lot of functions to define, the number of arguments is variable. To keep it simple, I will take the example of a maximum of three arguments: x_1, x_2 and x_3. The coefficients of the functions are given by a matrix X with the same number of columns than the number of existing arguments, say

using StatsBase
X=rand(5,3)

With the three arguments, I can create 7 functions: f(x_1), f(x_2), f(x_3), f(x_1,x_2), f(x_1,x_3), f(x_2,x_3) and f(x_1,x_2,x_3). The coefficients of each function are obtained selecting the corresponding submatrix of X, with each column corresponding to each variable. For example, to the function f(x_2,x_3), I take

Xsub=X[:,2:3]

and I can construct the function to optimize:

using Optim

f(x2,x3)=sum([3*transpose(Xsub[i,:])*[x2^2,x3^2] for i in 1:size(X,1)])

function g!(storage, x2,x3)
storage[1]=sum([6*Xsub[i,1]*x2 for i in 1:size(X,1)]) 
storage[2]=sum([6*Xsub[i,2]*x3 for i in 1:size(X,1)]) 
end

optimize(f, g!, ones(2), ConjugateGradient())

Is it possible to build only one code that will create every possible subset of variables and create all possible functions exactly as above to be optimized, rather to define all the 7 functions separately?

marius311 · June 25, 2020, 10:12pm

First a quick note that you can write that more succinctly and probably quicker to calculate as:

f(x2,x3) = 3sum(Xsub * [x2^2,x3^2])

But given that, a general strategy might be to parameterize your functions such that one Julia function encapsulates them all (assuming this can actually be done in your case, for which to know we’d probably need more examples of what these functinos are). E.g. something like:

f(x1,x2,x3,a,c1,p1,c2,p2,c3,p3) = a * sum(X * [c1*x1^p1, c2*x2^p2, c3*x3^p3])

and then optimization involves looping over different values of the parameters of the function rather than over different functions. For the cases where you don’t “have” a certain argument, you would just set its coefficient to zero, it doesn’t screw up ConjugateGradient() to have a zero gradient in a given direction.

sergevic · June 26, 2020, 2:04am

Ah thank you for your quick note @marius311! The general strategy seems good! I will try it, because I started coding the function with all variables! So I can set the corresponding parameters to zero when I want a subfunction, right? And concerning the equations for the gradient? Because the number of equations varies with the number of variables. I also set some parameters to have storage[j]=0 for the variables that don’t appear? And how can I set the parameters to zero in the command optimize(f, g!, ones(2), ConjugateGradient())? With map(x->optimize(f(x),g!(x),...), ...) ?

marius311 · June 26, 2020, 2:36am

If your functions are as simple as the example here, I wouldn’t even hand-code the gradient, I’d bet Julia’s automatic differentiation libraries are going to do really well on them. I forget exactly the interface, but Optim may even do it automatically for you.

I was imagining something along the lines of:

for (a,c1,p1,c2,p2,c3,p3) in possible_configurations
    optimize((x1,x2,x3) -> f(x1,x2,x3,a,c1,p1,c2,p2,c3,p3), ones(3), ConjugateGradient())
end

and then possible_configurations is a list of things you want, some of which have the coefficients set to zero.

sergevic · June 26, 2020, 2:45am

I will explore your loop proposal, thanks It seems a nice idea! For the optimization, I must provide the gradient when I use ConjugateGradient(). It has less evaluations to do and converges quicker than a gradient-free optimizer.

sergevic · June 26, 2020, 3:07am

Ah wait! I just realized something : in the help page of Optim.jl there is an example with ConjugateGradient() , the 2D Rosenbrock function. In the example, they provide the gradient function. I tried to run it without the gradient function and it worked. But as we do not provide the gradient function, the number of gradient calls is higher. When we provide the gradient function with g! , it lowers the number of gradient calls

sergevic · June 26, 2020, 5:05pm

I’m too sad… I tried to apply your suggestions to one of my functions and it doesn’t work :

X=rand(50,6)
h(beta0,beta1,beta2,beta3,beta4,beta5,c1,c2,c3,c4,c5)=-sum(X*[beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5])+sum(log.(1 .+exp.(X*[beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5])))+(sum([c1,c2,c3,c4,c5])+1)/2*log(2*pi)+1/2*sum([beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5].^2)

To test the function, I tried:

optimize((beta0,beta1,beta2,beta3,beta4,beta5) -> h(beta0,beta1,beta2,beta3,beta4,beta5,1.0,1.0,1.0,1.0,1.0), ones(6), ConjugateGradient())

but I get the following error:

ERROR: MethodError: no method matching (::getfield(Main, Symbol("##57#58")))(::Array{Float64,1})
Closest candidates are:
  #57(::Any, ::Any, ::Any, ::Any, ::Any, ::Any) at none:1
Stacktrace:
 [1] #finite_difference_gradient!#22(::Float64, ::Float64, ::Bool, ::Function, ::Array{Float64,1}, ::getfield(Main, Symbol("##57#58")), ::Array{Float64,1}, ::DiffEqDiffTools.GradientCache{Nothing,Nothing,Nothing,Val{:central},Float64,Val{true}}) at C:\Users\User\.julia\packages\DiffEqDiffTools\uD0fb\src\gradients.jl:321
 [2] finite_difference_gradient! at C:\Users\User\.julia\packages\DiffEqDiffTools\uD0fb\src\gradients.jl:273 [inlined]
 [3] (::getfield(NLSolversBase, Symbol("#g!#15")){getfield(Main, Symbol("##57#58")),DiffEqDiffTools.GradientCache{Nothing,Nothing,Nothing,Val{:central},Float64,Val{true}}})(::Array{Float64,1}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\objective_types\oncedifferentiable.jl:56
 [4] (::getfield(NLSolversBase, Symbol("#fg!#16")){getfield(Main, Symbol("##57#58"))})(::Array{Float64,1}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\objective_types\oncedifferentiable.jl:60     
 [5] value_gradient!!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\interface.jl:82
 [6] initial_state(::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\solvers\first_order\cg.jl:113
 [7] optimize at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\optimize.jl:33 [inlined]
 [8] #optimize#93(::Bool, ::Symbol, ::Function, ::Function, ::Array{Float64,1}, ::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\interface.jl:116
 [9] optimize(::Function, ::Array{Float64,1}, ::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\interface.jl:115 (repeats 2 times)
 [10] top-level scope at none:0

I think it pehaps comes from an Integer of Float problem, but I don’t see where… When I write
h(2,3,2,2,2,3,1,1,1,1,1) for example, it works and returns a value.

Otherwise, How should I configure possible_configurations in the for loop for Julia to take (c1,c2,c3,c4,c5) as a vector of values at each iteration? With zip?

marius311 · June 26, 2020, 8:07pm

The issue is just that Optim passes you an array rather than individual parameters, so you could just do something like:

f(betas) = h(betas...,1,1,1,1,1)

optimize(
    f,
    ones(6), 
    method = ConjugateGradient(),
    autodiff = :forward,
    x_tol = 1e-7
)

I looked up the Optim automatic differentiation API and its as easy as adding that autodiff parameter, so no need to hand-code gradients.

sergevic · June 27, 2020, 12:37am

Thanks @marius311 Worked perfectly Now I just have to code the for loop as you suggested I’ll keep you in touch

sergevic · June 27, 2020, 3:40am

I was able to implement your loop suggestion with this new way of passing the function to Optim!! Thanks a lot, @marius311!!!

Topic		Replies	Views
optim arguments: higher dimensional arrays and gradient specification Optimization (Mathematical) question , optim , optimization	10	2294	January 26, 2018
Using gradient function with Optim New to Julia question	4	1717	September 18, 2017
OptimizationOptimJL: single function returning objective and gradients Optimization (Mathematical)	2	497	September 12, 2023
Optim - provide gradient with fixed parameters Optimization (Mathematical)	10	2604	February 5, 2019
JuMP optimization with vector input and analytical gradient Optimization (Mathematical) question , package , jump	5	1022	June 27, 2021

Conjugate Gradient optimization with set of functions

Related topics