Can't seem to get Optimization.jl to converge to solution in Monte Carlo simulation but the objective function seems to work okay otherwise

sumant28 · October 15, 2022, 3:21am

A basic binary logic works fine

using Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random

function SimulateBinaryLogit(x, Beta)
  N = size(x, 1)
  J = size(Beta, 1)
  epsilon = -log.(-log.(rand(N, J)))
  Beta_augmented = [Beta zeros(2)]
  utility = x * Beta_augmented + epsilon
  (maxvalue, maxindex) = findmax(utility, dims=2)
  choice = getindex.(maxindex,2)
  y = choice .== 1
end

function BinaryLogitLL(u,p)
  x = p[:,1:2]
  y = p[:,3]
  Lambda_xb = exp.(x * u) ./ (1 .+ exp.(x * u))
  ll_i = y .* log.(Lambda_xb) + (1 .- y) .* log.(1 .- Lambda_xb)
  LL = -sum(ll_i)
end

Random.seed!(1)
N = 1000
Beta = [0.5; 0.5]
income = randn(N, 1)
x = [ones(N, 1) income]
y = SimulateBinaryLogit(x, Beta)
p = [x y]
u0 = [0.0; 0.0]

optf = OptimizationFunction(BinaryLogitLL, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u0, p, lb = [-10.0, -10.0], ub = [10.0, 10.0])
sol = solve(prob, BFGS())

Everything is fine so far. However when I try and expand this to a Monte Carlo simulation

struct input 
  y
  x
  R
end

function BinaryLogitSimulatedLL(u, p)
  y = p.y
  x = p.x
  R = p.R
  Random.seed!(1)
  N = size(y, 1)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  for i in 1:R
    Simulated_y[:,i] = SimulateBinaryLogit(x, u)
  end
  SimProb = mean(Simulated_y, dims=2)
  ll_i = y .* log.(SimProb) .+ (1 .- y) .* log.(1 .- SimProb)
  LL = -sum(ll_i)
end

R = 1000
p = input(y,x,R)
u = [0.0; 0.0]

optf = OptimizationFunction(BinaryLogitSimulatedLL, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u, p, lb = -1.0*ones(2), ub = ones(2))
sol = solve(prob, BFGS())

It seems that whatever initial values I give the routine tells me that is optimal even though I set the optimal solution in advance to be [0.5;0.5]

ChrisRackauckas · October 15, 2022, 4:38am

Try something that isn’t BFGS. For example, try OptimizationOptimisers.Adam() as the method.

sumant28 · October 15, 2022, 5:37am

I replaced the final line with

sol = solve(prob, ADAM(); maxiters = 1000)

and imported OptimizationOptimisers. Still the same result thanks for the suggestion though. If you know where I can look for advice on the package generally not converging/staying at whatever initial values are provided I would be grateful

odow · October 15, 2022, 6:05am

The problem is not the optimizer. The problem is that your BinaryLogitSimulatedLL is non-differentiable and ForwardDiff returns a gradient of [0.0, 0.0]:

julia> ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0.5, 0.5])
2-element Vector{Float64}:
 0.0
 0.0

julia> ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0.0, 0.0])
2-element Vector{Float64}:
 0.0
 0.0

I really don’t understand why an error doesn’t get thrown.

I think it’s because your choice .== 1 return from the SimulateBinaryLogit returns a 0 matrix rather than erroring:

julia> ForwardDiff.jacobian(x -> SimulateBinaryLogit(p.x, x), [0.0, 0.0])
1000×2 BitMatrix:
0 0
0 0
...

sumant28 · October 15, 2022, 6:42am

Thank you odow. I feel so stupid I didn’t write the call to simulation function properly and need to change my input struct to include the beta value that influences what the output vector looks like. I also appreciate you for giving me a method to troubleshoot where I went wrong for future reference

odow · October 15, 2022, 6:49am

I feel so stupid

Don’t! This looks highly non-trivial…

I didn’t write the call to simulation function properly and need to change my input struct to include the beta value that influences what the output vector looks like

To clarify, the problem is that an optimizer like BGFS needs to evaluate the gradient of your objective function. But your objective function involves discrete terms like findmax which is non-differentiable so you cannot compute a gradient of it.

You either need to:

Use a derivative free optimization method
Re-think how you are computing your objective so that it becomes differentiable

sumant28 · October 16, 2022, 8:41am

using Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random, OptimizationOptimisers, Statistics

function SimulateBinaryLogit(x, Beta)
  # Random.seed!(1)
  N = size(x, 1)
  J = size(Beta, 1)
  epsilon = -log.(-log.(rand(N, J)))
  Beta_augmented = [Beta zeros(2)]
  utility = x * Beta_augmented + epsilon
  (maxvalue, maxindex) = findmax(utility, dims=2)
  choice = getindex.(maxindex,2)
  y = choice .== 1
  y = convert(Array{Float64}, y)
end

function BinaryLogitLL(u,p)
  x = p[:,1:2]
  y = p[:,3]
  Lambda_xb = exp.(x * u) ./ (1 .+ exp.(x * u))
  ll_i = y .* log.(Lambda_xb) + (1 .- y) .* log.(1 .- Lambda_xb)
  LL = -sum(ll_i)
end

function BinaryLogitSimulatedLL(u, p)
  y = p.y
  x = p.x
  R = p.R
  Beta = u
  Random.seed!(1)
  # N = size(y, 1)
  Simulated_y = Array{Float64, 2}(undef, R, R)
  for i in 1:R
    Simulated_y[:,i] = SimulateBinaryLogit(x, Beta)
  end
  SimProb = mean(Simulated_y, dims=2)
  ll_i = y .* log.(SimProb) .+ (1 .- y) .* log.(1 .- SimProb)
  LL = -sum(ll_i)
end

struct input 
  y
  x
  R
end

Random.seed!(1)
N = 1000
Beta = [0.5; 0.5]
income = randn(N, 1)
x = [ones(N, 1) income]
y = SimulateBinaryLogit(x, Beta)
R = 1000
p = input(y,x,R)
u = [0.0; 0.0]

ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0;0])
p = [x y]
ForwardDiff.gradient(x -> BinaryLogitLL(x, p), [0;0])

I converted the discrete function output and still feel no closer to making my original problem work. I am confused by the fact that the earlier function has no problem with gradient calculation but the simulation does.

ChrisRackauckas · October 16, 2022, 8:52am

As @odow pointed out it’s a non-differentiable loss, try NelderMead() or SAMIN() for the optimizer.

sumant28 · October 29, 2022, 10:07am

I guess the problem comes from adding more dimensions to the objective function because the objective function I am truly interested in has five unknowns

using Distributions, StatsBase, Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random, OptimizationOptimisers, Statistics

function SimulateMNProbit(x, Betavec, Omega)
  N = size(x, 1)
  K = size(x, 2)
  J = size(Betavec, 1)/K + 1
  J = convert(Int64, J)
  Beta = reshape(Betavec, K, J - 1)
  xb = x * Beta
  diff = [zeros(N, 1) xb + rand(MvNormal(zeros(J - 1), Omega), N)']
  (maxvalue, maxindex) = findmax(diff, dims=2)
  choice = getindex.(maxindex,2)
end

function MNProbitSimulatedLL(param, y, x, R)
  BetaV = param[1:end - 1]
  Rho = param[end]
  Omega = [1 Rho; Rho 1]
  Random.seed!(1)
  N = size(y, 1)
  J = maximum(y)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  SimulatedProb = Array{Float64, 2}(undef, N, J)
  MyIndex = Array{Float64, 2}(undef, N, J)
  for count = 1:R
    Simulated_y[:, count] = SimulateMNProbit(x, BetaV, Omega)
  end
  for j = 1:J
    a = mean(Simulated_y .== j, dims=2)
    SimulatedProb[:,j] = a
    MyIndex[:,j] = y .== j
  end
  ll_i = sum(MyIndex .* log.(SimulatedProb), dims=2)
  LL = -sum(ll_i)
end

function MNProbitSimulatedLLu(u, p)
  y = p.y
  x = p.x
  R = p.R
  BetaV = u[1:end - 1]
  Rho = u[end]
  Omega = [1 Rho; Rho 1]
  Random.seed!(1)
  N = size(y, 1)
  J = maximum(y)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  SimulatedProb = Array{Float64, 2}(undef, N, J)
  MyIndex = Array{Float64, 2}(undef, N, J)
  for count = 1:R
    Simulated_y[:, count] = SimulateMNProbit(x, BetaV, Omega)
  end
  for j = 1:J
    a = mean(Simulated_y .== j, dims=2)
    SimulatedProb[:,j] = a
    MyIndex[:,j] = y .== j
  end
  ll_i = sum(MyIndex .* log.(SimulatedProb), dims=2)
  LL = -sum(ll_i)
end

struct input 
  y
  x
  R
end

Random.seed!(1)
N = 1000
Beta = 0.5*ones(4)
income = rand(N, 1)
x      = [ones(N, 1) income] 
omega = [1 0.5;0.5 1];
y = SimulateMNProbit(x, Beta, omega)
R = 100
p = input(y, x, R)
u =  0.5*ones(5)

LL = MNProbitSimulatedLL(Beta, y, x, R)

lb = [-10;-10;-10;-10;-1.0]
ub = [10;10;10;10;1.0]

optf = OptimizationFunction(MNProbitSimulatedLLu, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u, p, lb = lb, ub = ub)

sol = solve(prob, NelderMead())

The above code does not converge at all but does seem to be written correctly in that if I input theoretically optimal values the log likelihood gives the lowest possible number compared to any other input

If I am unable to access a gradient based approach I would want some general tips on what to do with a locally flat function in Julia. In MATLAB the DiffMinChange parameter in fmincon can be adjusted but I don’t know if something like that is possible in Julia

dlakelan · October 29, 2022, 5:00pm

Just by skimming it seems you are doing statistical model fitting? In which case recast your problem as a Bayesian inference and sample from the posterior. If it’s locally flat you will get a lot of samples from a wide range and this will better reflect the reality of your problem than finding one particular point estimate

Topic		Replies	Views
Maximum Likelihood Multivariate Model Optimization (Mathematical) optim	15	1579	January 10, 2020
Maximum Likelihood: Normal Linear Model Optimization (Mathematical)	31	4264	December 25, 2020
Why is this optimization problem so slow in Julia? In R it took 60s Optimization (Mathematical)	31	1978	May 22, 2021
MAP optimization in Turing with LBFGS doesn't move at all Probabilistic Programming optimization , turing , autodiff	22	268	August 23, 2024
Custom likelihood function optimization Optimization (Mathematical)	2	414	March 24, 2021

Can't seem to get Optimization.jl to converge to solution in Monte Carlo simulation but the objective function seems to work okay otherwise

Related topics