How to use LBFGS to minimise the cost function for regularised logistic regression

PyDataBlog · February 1, 2020, 8:01am

I am currently stuck trying to utilize the Optim package in Julia in an attempt to minimize a cost function. The cost function is the cost function for an L2 regularised logistic regression. It is constructed as follows;

using Optim

function regularised_cost(X, y, θ, λ)
    m = length(y)

    # Sigmoid predictions
    h = sigmoid(X * θ)

    # left side of the cost function
    positive_class_cost = ((-y)' * log.(h))

    # right side of the cost function
    negative_class_cost = ((1 .- y)' * log.(1 .- h))

    # lambda effect
    lambda_regularization = (λ/(2*m) * sum(θ[2 : end] .^ 2))

    # Current batch cost
    𝐉 = (1/m) * (positive_class_cost - negative_class_cost) + lambda_regularization

    # Gradients for all the theta members with regularization except the constant
    ∇𝐉 = (1/m) * (X') * (h-y) + ((1/m) * (λ * θ))  

    ∇𝐉[1] = (1/m) * (X[:, 1])' * (h-y) # Exclude the constant

    return (𝐉, ∇𝐉)
end

I would like to use LBFGS algorithm as a solver to find the best weights that minimize this function based on my training examples and labels which are defined as:

opt_train = [ones(size(X_train_scaled, 1)) X_train_scaled] # added intercept
initial_theta = zeros(size(opt_train, 2))

Having read the documentation, here’s my current implementation which is currently not working:

res = optimize(b -> regularised_cost(opt_train, y_train, initial_theta, 0.01),
               method=LBFGS(),
               Optim.Options(show_trace=true, iterations = 1000))

How do I pass my training examples and labels along with the gradients so that the solver (LBFGS) can find me the best values for theta?

lostella · February 1, 2020, 8:55am

I have used Optim only a few times and long ago, but here’s my guess: it seem that you are passing a constant function of b to it, maybe you meant passing

b -> regularised_cost(opt_train, y_train, b, 0.01)

Still you probably need to provide the initial iterate somehow

PyDataBlog · February 1, 2020, 9:10am

I tried that suggestion

res = optimize(b -> regularised_cost(opt_train, y_train, b, 0.01),
               method=LBFGS(), iterations = 1000)

I got this error:

MethodError: no method matching optimize(::var"#13#14"; method=LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}(10, LineSearches.InitialStatic{Float64}
  alpha: Float64 1.0
  scaled: Bool false
, LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}
  delta: Float64 0.1
  sigma: Float64 0.9
  alphamax: Float64 Inf
  rho: Float64 5.0
  epsilon: Float64 1.0e-6
  gamma: Float64 0.66
  linesearchmax: Int64 50
  psi3: Float64 0.1
  display: Int64 0
  mayterminate: Base.RefValue{Bool}
, nothing, Optim.var"#19#21"(), Flat(), true), iterations=1000)
Closest candidates are:
  optimize(::F, !Matched::T, !Matched::T; method, rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at /Users/mysterio/.julia/packages/Optim/Q2XsG/src/univariate/optimize/interface.jl:14
  optimize(::F, !Matched::T, !Matched::T, !Matched::GoldenSection; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace, nargs...) where {F<:Function, T<:AbstractFloat} at /Users/./.julia/packages/Optim/Q2XsG/src/univariate/solvers/golden_section.jl:54
  optimize(::F, !Matched::T, !Matched::T, !Matched::Brent; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at /Users/./.julia/packages/Optim/Q2XsG/src/univariate/solvers/brent.jl:58 got unsupported keyword argument "method"

tlienart · February 1, 2020, 10:19am

This is the line I use to do exactly that in MLJLinearModels:

https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/aa7c4a93047e0fb8e9ff99e8105bfca79e87cba0/src/fit/newton.jl#L62

Note the line before is using tricks explained in the docs of Optim to define a ‘only_fg!’ which gets the function cost and computes the gradient in place.

PyDataBlog · February 1, 2020, 11:04am

I guess I have to somehow find a way to follow that workflow.

tlienart · February 1, 2020, 11:35am

Here’s the doc I’m referring to which I think explains it pretty well: Optim.jl

pkofod · February 1, 2020, 9:21pm

I’m not sure that interface is still valid (your keywords…)

If you didn’t return your gradient, you could try

fun(b) = regularised_cost(opt_train, y_train, b, 0.01)
res = optimize(fun, LBFGS, Optim.Options(iterations=1000))

but that would be assuming you were passing in a function that only returned your objective value. As you can see from the docs linked above, you can use the only_fg! version like this:

function fg!(F,G,x)
  # do common computations here
  # ...
  if G != nothing
    # code to compute gradient here
    # writing the result to the vector G
  end
  if F != nothing
    # value = ... code to compute objective function
    return value
  end
end

So, in your case, it could be something like

function regularised_cost(F, G, θ, X, y, λ)
    m = length(y)

    # Sigmoid predictions
    h = sigmoid(X * θ)

    # left side of the cost function
    positive_class_cost = ((-y)' * log.(h))

    # right side of the cost function
    negative_class_cost = ((1 .- y)' * log.(1 .- h))

    # lambda effect
    lambda_regularization = (λ/(2*m) * sum(θ[2 : end] .^ 2))
    if  G != nothing
        # Gradients for all the theta members with regularization except the constant
        ∇𝐉 .= (1/m) * (X') * (h-y) + ((1/m) * (λ * θ))  

        ∇𝐉[1] = (1/m) * (X[:, 1])' * (h-y) # Exclude the constant
    end
    if F != nothing
        # Current batch cost
        𝐉 = (1/m) * (positive_class_cost - negative_class_cost) + lambda_regularization
        return 𝐉
    end
    return nothing
end

fun(F, G, b) = regularised_cost(F, G, b, opt_train, y_train, 0.01)
res = optimize(fun, LBFGS, Optim.Options(iterations=1000))

or… you might get lucky, and https://github.com/frapac/LogisticOptTools.jl will get registered as a package

PyDataBlog · February 2, 2020, 11:12am

I will look into all these options. I will share my findings as well.

Topic		Replies	Views
How to use a minimise mean squared loss function in Julia for gradient descent? General Usage	0	998	January 16, 2020
Making LBFGS in `Optim.jl` as fast as in `NLopt.jl` Optimization (Mathematical)	13	1882	December 6, 2022
How to use LBFGS to get the minimum value in the optimization process, but not the last one Performance package , optim , optimization	5	962	October 22, 2021
Can someone help me interpret this error: Dimension mismatch with BFGS New to Julia optimization	0	91	May 2, 2024
Optimization Error: Optimization (Mathematical) optim	4	1005	March 4, 2022

How to use LBFGS to minimise the cost function for regularised logistic regression

Related topics