Conjugate Gradient optimization with set of functions

This post is related to a previous post of mine and I prefered to expose the problem separatey. I have to optimize a set of functions with conjugate gradient optimization, but I have to define the whole set of functions first. The problem is that I have a lot of functions to define, the number of arguments is variable. To keep it simple, I will take the example of a maximum of three arguments: x_1, x_2 and x_3. The coefficients of the functions are given by a matrix X with the same number of columns than the number of existing arguments, say

using StatsBase
X=rand(5,3)

With the three arguments, I can create 7 functions: f(x_1), f(x_2), f(x_3), f(x_1,x_2), f(x_1,x_3), f(x_2,x_3) and f(x_1,x_2,x_3). The coefficients of each function are obtained selecting the corresponding submatrix of X, with each column corresponding to each variable. For example, to the function f(x_2,x_3), I take

Xsub=X[:,2:3]

and I can construct the function to optimize:

using Optim

f(x2,x3)=sum([3*transpose(Xsub[i,:])*[x2^2,x3^2] for i in 1:size(X,1)])

function g!(storage, x2,x3)
storage[1]=sum([6*Xsub[i,1]*x2 for i in 1:size(X,1)]) 
storage[2]=sum([6*Xsub[i,2]*x3 for i in 1:size(X,1)]) 
end

optimize(f, g!, ones(2), ConjugateGradient())

Is it possible to build only one code that will create every possible subset of variables and create all possible functions exactly as above to be optimized, rather to define all the 7 functions separately?

First a quick note that you can write that more succinctly and probably quicker to calculate as:

f(x2,x3) = 3sum(Xsub * [x2^2,x3^2])

But given that, a general strategy might be to parameterize your functions such that one Julia function encapsulates them all (assuming this can actually be done in your case, for which to know we’d probably need more examples of what these functinos are). E.g. something like:

f(x1,x2,x3,a,c1,p1,c2,p2,c3,p3) = a * sum(X * [c1*x1^p1, c2*x2^p2, c3*x3^p3])

and then optimization involves looping over different values of the parameters of the function rather than over different functions. For the cases where you don’t “have” a certain argument, you would just set its coefficient to zero, it doesn’t screw up ConjugateGradient() to have a zero gradient in a given direction.

1 Like

Ah thank you for your quick note @marius311! The general strategy seems good! I will try it, because I started coding the function with all variables! So I can set the corresponding parameters to zero when I want a subfunction, right? And concerning the equations for the gradient? Because the number of equations varies with the number of variables. I also set some parameters to have storage[j]=0 for the variables that don’t appear? And how can I set the parameters to zero in the command optimize(f, g!, ones(2), ConjugateGradient())? With map(x->optimize(f(x),g!(x),...), ...) ?

If your functions are as simple as the example here, I wouldn’t even hand-code the gradient, I’d bet Julia’s automatic differentiation libraries are going to do really well on them. I forget exactly the interface, but Optim may even do it automatically for you.

I was imagining something along the lines of:

for (a,c1,p1,c2,p2,c3,p3) in possible_configurations
    optimize((x1,x2,x3) -> f(x1,x2,x3,a,c1,p1,c2,p2,c3,p3), ones(3), ConjugateGradient())
end

and then possible_configurations is a list of things you want, some of which have the coefficients set to zero.

1 Like

I will explore your loop proposal, thanks :slight_smile: It seems a nice idea! For the optimization, I must provide the gradient when I use ConjugateGradient(). It has less evaluations to do and converges quicker than a gradient-free optimizer.

Ah wait! I just realized something : in the help page of Optim.jl there is an example with ConjugateGradient() , the 2D Rosenbrock function. In the example, they provide the gradient function. I tried to run it without the gradient function and it worked. But as we do not provide the gradient function, the number of gradient calls is higher. When we provide the gradient function with g! , it lowers the number of gradient calls :wink:

I’m too sad… I tried to apply your suggestions to one of my functions and it doesn’t work :frowning: :

X=rand(50,6)
h(beta0,beta1,beta2,beta3,beta4,beta5,c1,c2,c3,c4,c5)=-sum(X*[beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5])+sum(log.(1 .+exp.(X*[beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5])))+(sum([c1,c2,c3,c4,c5])+1)/2*log(2*pi)+1/2*sum([beta0,c1*beta1,c2*beta2,c3*beta3,c4*beta4,c5*beta5].^2)

To test the function, I tried:

optimize((beta0,beta1,beta2,beta3,beta4,beta5) -> h(beta0,beta1,beta2,beta3,beta4,beta5,1.0,1.0,1.0,1.0,1.0), ones(6), ConjugateGradient())

but I get the following error:

ERROR: MethodError: no method matching (::getfield(Main, Symbol("##57#58")))(::Array{Float64,1})
Closest candidates are:
  #57(::Any, ::Any, ::Any, ::Any, ::Any, ::Any) at none:1
Stacktrace:
 [1] #finite_difference_gradient!#22(::Float64, ::Float64, ::Bool, ::Function, ::Array{Float64,1}, ::getfield(Main, Symbol("##57#58")), ::Array{Float64,1}, ::DiffEqDiffTools.GradientCache{Nothing,Nothing,Nothing,Val{:central},Float64,Val{true}}) at C:\Users\User\.julia\packages\DiffEqDiffTools\uD0fb\src\gradients.jl:321
 [2] finite_difference_gradient! at C:\Users\User\.julia\packages\DiffEqDiffTools\uD0fb\src\gradients.jl:273 [inlined]
 [3] (::getfield(NLSolversBase, Symbol("#g!#15")){getfield(Main, Symbol("##57#58")),DiffEqDiffTools.GradientCache{Nothing,Nothing,Nothing,Val{:central},Float64,Val{true}}})(::Array{Float64,1}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\objective_types\oncedifferentiable.jl:56
 [4] (::getfield(NLSolversBase, Symbol("#fg!#16")){getfield(Main, Symbol("##57#58"))})(::Array{Float64,1}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\objective_types\oncedifferentiable.jl:60     
 [5] value_gradient!!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\NLSolversBase\NsXIC\src\interface.jl:82
 [6] initial_state(::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\solvers\first_order\cg.jl:113
 [7] optimize at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\optimize.jl:33 [inlined]
 [8] #optimize#93(::Bool, ::Symbol, ::Function, ::Function, ::Array{Float64,1}, ::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\interface.jl:116
 [9] optimize(::Function, ::Array{Float64,1}, ::ConjugateGradient{Float64,Nothing,getfield(Optim, Symbol("##30#32")),LineSearches.InitialHagerZhang{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at C:\Users\User\.julia\packages\Optim\L5T76\src\multivariate\optimize\interface.jl:115 (repeats 2 times)
 [10] top-level scope at none:0

I think it pehaps comes from an Integer of Float problem, but I don’t see where… :frowning: When I write
h(2,3,2,2,2,3,1,1,1,1,1) for example, it works and returns a value.

Otherwise, How should I configure possible_configurations in the for loop for Julia to take (c1,c2,c3,c4,c5) as a vector of values at each iteration? With zip?

The issue is just that Optim passes you an array rather than individual parameters, so you could just do something like:

f(betas) = h(betas...,1,1,1,1,1)

optimize(
    f,
    ones(6), 
    method = ConjugateGradient(),
    autodiff = :forward,
    x_tol = 1e-7
)

I looked up the Optim automatic differentiation API and its as easy as adding that autodiff parameter, so no need to hand-code gradients.

1 Like

Thanks @marius311 :slight_smile: Worked perfectly :slight_smile: Now I just have to code the for loop as you suggested :slight_smile: I’ll keep you in touch :wink:

I was able to implement your loop suggestion with this new way of passing the function to Optim!! Thanks a lot, @marius311!!!

1 Like