# How to use LBFGS to minimise the cost function for regularised logistic regression

I am currently stuck trying to utilize the Optim package in Julia in an attempt to minimize a cost function. The cost function is the cost function for an L2 regularised logistic regression. It is constructed as follows;

``````using Optim

function regularised_cost(X, y, θ, λ)
m = length(y)

# Sigmoid predictions
h = sigmoid(X * θ)

# left side of the cost function
positive_class_cost = ((-y)' * log.(h))

# right side of the cost function
negative_class_cost = ((1 .- y)' * log.(1 .- h))

# lambda effect
lambda_regularization = (λ/(2*m) * sum(θ[2 : end] .^ 2))

# Current batch cost
𝐉 = (1/m) * (positive_class_cost - negative_class_cost) + lambda_regularization

# Gradients for all the theta members with regularization except the constant
∇𝐉 = (1/m) * (X') * (h-y) + ((1/m) * (λ * θ))

∇𝐉 = (1/m) * (X[:, 1])' * (h-y) # Exclude the constant

return (𝐉, ∇𝐉)
end
``````

I would like to use LBFGS algorithm as a solver to find the best weights that minimize this function based on my training examples and labels which are defined as:

``````opt_train = [ones(size(X_train_scaled, 1)) X_train_scaled] # added intercept
initial_theta = zeros(size(opt_train, 2))
``````

Having read the documentation, here’s my current implementation which is currently not working:

``````res = optimize(b -> regularised_cost(opt_train, y_train, initial_theta, 0.01),
method=LBFGS(),
Optim.Options(show_trace=true, iterations = 1000))
``````

How do I pass my training examples and labels along with the gradients so that the solver (LBFGS) can find me the best values for theta?

I have used Optim only a few times and long ago, but here’s my guess: it seem that you are passing a constant function of `b` to it, maybe you meant passing

``````b -> regularised_cost(opt_train, y_train, b, 0.01)
``````

Still you probably need to provide the initial iterate somehow

1 Like

I tried that suggestion

``````res = optimize(b -> regularised_cost(opt_train, y_train, b, 0.01),
method=LBFGS(), iterations = 1000)
``````

I got this error:

``````MethodError: no method matching optimize(::var"#13#14"; method=LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}(10, LineSearches.InitialStatic{Float64}
alpha: Float64 1.0
scaled: Bool false
, LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}
delta: Float64 0.1
sigma: Float64 0.9
alphamax: Float64 Inf
rho: Float64 5.0
epsilon: Float64 1.0e-6
gamma: Float64 0.66
linesearchmax: Int64 50
psi3: Float64 0.1
display: Int64 0
mayterminate: Base.RefValue{Bool}
, nothing, Optim.var"#19#21"(), Flat(), true), iterations=1000)
Closest candidates are:
optimize(::F, !Matched::T, !Matched::T; method, rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at /Users/mysterio/.julia/packages/Optim/Q2XsG/src/univariate/optimize/interface.jl:14
optimize(::F, !Matched::T, !Matched::T, !Matched::GoldenSection; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace, nargs...) where {F<:Function, T<:AbstractFloat} at /Users/./.julia/packages/Optim/Q2XsG/src/univariate/solvers/golden_section.jl:54
optimize(::F, !Matched::T, !Matched::T, !Matched::Brent; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at /Users/./.julia/packages/Optim/Q2XsG/src/univariate/solvers/brent.jl:58 got unsupported keyword argument "method"
``````

This is the line I use to do exactly that in MLJLinearModels:

https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/aa7c4a93047e0fb8e9ff99e8105bfca79e87cba0/src/fit/newton.jl#L62

Note the line before is using tricks explained in the docs of Optim to define a ‘only_fg!’ which gets the function cost and computes the gradient in place.

1 Like

I guess I have to somehow find a way to follow that workflow.

Here’s the doc I’m referring to which I think explains it pretty well: Optim.jl

2 Likes

I’m not sure that interface is still valid (your keywords…)

If you didn’t return your gradient, you could try

``````fun(b) = regularised_cost(opt_train, y_train, b, 0.01)
res = optimize(fun, LBFGS, Optim.Options(iterations=1000))
``````

but that would be assuming you were passing in a function that only returned your objective value. As you can see from the docs linked above, you can use the `only_fg!` version like this:

``````function fg!(F,G,x)
# do common computations here
# ...
if G != nothing
# code to compute gradient here
# writing the result to the vector G
end
if F != nothing
# value = ... code to compute objective function
return value
end
end
``````

So, in your case, it could be something like

``````function regularised_cost(F, G, θ, X, y, λ)
m = length(y)

# Sigmoid predictions
h = sigmoid(X * θ)

# left side of the cost function
positive_class_cost = ((-y)' * log.(h))

# right side of the cost function
negative_class_cost = ((1 .- y)' * log.(1 .- h))

# lambda effect
lambda_regularization = (λ/(2*m) * sum(θ[2 : end] .^ 2))
if  G != nothing
# Gradients for all the theta members with regularization except the constant
∇𝐉 .= (1/m) * (X') * (h-y) + ((1/m) * (λ * θ))

∇𝐉 = (1/m) * (X[:, 1])' * (h-y) # Exclude the constant
end
if F != nothing
# Current batch cost
𝐉 = (1/m) * (positive_class_cost - negative_class_cost) + lambda_regularization
return 𝐉
end
return nothing
end

fun(F, G, b) = regularised_cost(F, G, b, opt_train, y_train, 0.01)
res = optimize(fun, LBFGS, Optim.Options(iterations=1000))
``````

or… you might get lucky, and https://github.com/frapac/LogisticOptTools.jl will get registered as a package 3 Likes

I will look into all these options. I will share my findings as well.

1 Like