Can't seem to get Optimization.jl to converge to solution in Monte Carlo simulation but the objective function seems to work okay otherwise

A basic binary logic works fine

using Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random

function SimulateBinaryLogit(x, Beta)
  N = size(x, 1)
  J = size(Beta, 1)
  epsilon = -log.(-log.(rand(N, J)))
  Beta_augmented = [Beta zeros(2)]
  utility = x * Beta_augmented + epsilon
  (maxvalue, maxindex) = findmax(utility, dims=2)
  choice = getindex.(maxindex,2)
  y = choice .== 1
end

function BinaryLogitLL(u,p)
  x = p[:,1:2]
  y = p[:,3]
  Lambda_xb = exp.(x * u) ./ (1 .+ exp.(x * u))
  ll_i = y .* log.(Lambda_xb) + (1 .- y) .* log.(1 .- Lambda_xb)
  LL = -sum(ll_i)
end

Random.seed!(1)
N = 1000
Beta = [0.5; 0.5]
income = randn(N, 1)
x = [ones(N, 1) income]
y = SimulateBinaryLogit(x, Beta)
p = [x y]
u0 = [0.0; 0.0]

optf = OptimizationFunction(BinaryLogitLL, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u0, p, lb = [-10.0, -10.0], ub = [10.0, 10.0])
sol = solve(prob, BFGS())

Everything is fine so far. However when I try and expand this to a Monte Carlo simulation

struct input 
  y
  x
  R
end

function BinaryLogitSimulatedLL(u, p)
  y = p.y
  x = p.x
  R = p.R
  Random.seed!(1)
  N = size(y, 1)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  for i in 1:R
    Simulated_y[:,i] = SimulateBinaryLogit(x, u)
  end
  SimProb = mean(Simulated_y, dims=2)
  ll_i = y .* log.(SimProb) .+ (1 .- y) .* log.(1 .- SimProb)
  LL = -sum(ll_i)
end

R = 1000
p = input(y,x,R)
u = [0.0; 0.0]

optf = OptimizationFunction(BinaryLogitSimulatedLL, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u, p, lb = -1.0*ones(2), ub = ones(2))
sol = solve(prob, BFGS())

It seems that whatever initial values I give the routine tells me that is optimal even though I set the optimal solution in advance to be [0.5;0.5]

Try something that isn’t BFGS. For example, try OptimizationOptimisers.Adam() as the method.

I replaced the final line with

sol = solve(prob, ADAM(); maxiters = 1000)

and imported OptimizationOptimisers. Still the same result :frowning: thanks for the suggestion though. If you know where I can look for advice on the package generally not converging/staying at whatever initial values are provided I would be grateful

The problem is not the optimizer. The problem is that your BinaryLogitSimulatedLL is non-differentiable and ForwardDiff returns a gradient of [0.0, 0.0]:

julia> ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0.5, 0.5])
2-element Vector{Float64}:
 0.0
 0.0

julia> ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0.0, 0.0])
2-element Vector{Float64}:
 0.0
 0.0

I really don’t understand why an error doesn’t get thrown.

I think it’s because your choice .== 1 return from the SimulateBinaryLogit returns a 0 matrix rather than erroring:

julia> ForwardDiff.jacobian(x -> SimulateBinaryLogit(p.x, x), [0.0, 0.0])
1000Ă—2 BitMatrix:
0 0
0 0
...

Thank you odow. I feel so stupid :cry: I didn’t write the call to simulation function properly and need to change my input struct to include the beta value that influences what the output vector looks like. I also appreciate you for giving me a method to troubleshoot where I went wrong for future reference

1 Like

I feel so stupid

Don’t! This looks highly non-trivial…

I didn’t write the call to simulation function properly and need to change my input struct to include the beta value that influences what the output vector looks like

To clarify, the problem is that an optimizer like BGFS needs to evaluate the gradient of your objective function. But your objective function involves discrete terms like findmax which is non-differentiable so you cannot compute a gradient of it.

You either need to:

  • Use a derivative free optimization method
  • Re-think how you are computing your objective so that it becomes differentiable
1 Like
using Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random, OptimizationOptimisers, Statistics

function SimulateBinaryLogit(x, Beta)
  # Random.seed!(1)
  N = size(x, 1)
  J = size(Beta, 1)
  epsilon = -log.(-log.(rand(N, J)))
  Beta_augmented = [Beta zeros(2)]
  utility = x * Beta_augmented + epsilon
  (maxvalue, maxindex) = findmax(utility, dims=2)
  choice = getindex.(maxindex,2)
  y = choice .== 1
  y = convert(Array{Float64}, y)
end

function BinaryLogitLL(u,p)
  x = p[:,1:2]
  y = p[:,3]
  Lambda_xb = exp.(x * u) ./ (1 .+ exp.(x * u))
  ll_i = y .* log.(Lambda_xb) + (1 .- y) .* log.(1 .- Lambda_xb)
  LL = -sum(ll_i)
end

function BinaryLogitSimulatedLL(u, p)
  y = p.y
  x = p.x
  R = p.R
  Beta = u
  Random.seed!(1)
  # N = size(y, 1)
  Simulated_y = Array{Float64, 2}(undef, R, R)
  for i in 1:R
    Simulated_y[:,i] = SimulateBinaryLogit(x, Beta)
  end
  SimProb = mean(Simulated_y, dims=2)
  ll_i = y .* log.(SimProb) .+ (1 .- y) .* log.(1 .- SimProb)
  LL = -sum(ll_i)
end

struct input 
  y
  x
  R
end

Random.seed!(1)
N = 1000
Beta = [0.5; 0.5]
income = randn(N, 1)
x = [ones(N, 1) income]
y = SimulateBinaryLogit(x, Beta)
R = 1000
p = input(y,x,R)
u = [0.0; 0.0]

ForwardDiff.gradient(x -> BinaryLogitSimulatedLL(x, p), [0;0])
p = [x y]
ForwardDiff.gradient(x -> BinaryLogitLL(x, p), [0;0])

I converted the discrete function output and still feel no closer to making my original problem work. I am confused by the fact that the earlier function has no problem with gradient calculation but the simulation does.

As @odow pointed out it’s a non-differentiable loss, try NelderMead() or SAMIN() for the optimizer.

I guess the problem comes from adding more dimensions to the objective function because the objective function I am truly interested in has five unknowns

using Distributions, StatsBase, Optimization, OptimizationOptimJL, OptimizationBBO, OptimizationMOI, ForwardDiff, ModelingToolkit, Random, OptimizationOptimisers, Statistics

function SimulateMNProbit(x, Betavec, Omega)
  N = size(x, 1)
  K = size(x, 2)
  J = size(Betavec, 1)/K + 1
  J = convert(Int64, J)
  Beta = reshape(Betavec, K, J - 1)
  xb = x * Beta
  diff = [zeros(N, 1) xb + rand(MvNormal(zeros(J - 1), Omega), N)']
  (maxvalue, maxindex) = findmax(diff, dims=2)
  choice = getindex.(maxindex,2)
end

function MNProbitSimulatedLL(param, y, x, R)
  BetaV = param[1:end - 1]
  Rho = param[end]
  Omega = [1 Rho; Rho 1]
  Random.seed!(1)
  N = size(y, 1)
  J = maximum(y)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  SimulatedProb = Array{Float64, 2}(undef, N, J)
  MyIndex = Array{Float64, 2}(undef, N, J)
  for count = 1:R
    Simulated_y[:, count] = SimulateMNProbit(x, BetaV, Omega)
  end
  for j = 1:J
    a = mean(Simulated_y .== j, dims=2)
    SimulatedProb[:,j] = a
    MyIndex[:,j] = y .== j
  end
  ll_i = sum(MyIndex .* log.(SimulatedProb), dims=2)
  LL = -sum(ll_i)
end

function MNProbitSimulatedLLu(u, p)
  y = p.y
  x = p.x
  R = p.R
  BetaV = u[1:end - 1]
  Rho = u[end]
  Omega = [1 Rho; Rho 1]
  Random.seed!(1)
  N = size(y, 1)
  J = maximum(y)
  Simulated_y = Array{Float64, 2}(undef, N, R)
  SimulatedProb = Array{Float64, 2}(undef, N, J)
  MyIndex = Array{Float64, 2}(undef, N, J)
  for count = 1:R
    Simulated_y[:, count] = SimulateMNProbit(x, BetaV, Omega)
  end
  for j = 1:J
    a = mean(Simulated_y .== j, dims=2)
    SimulatedProb[:,j] = a
    MyIndex[:,j] = y .== j
  end
  ll_i = sum(MyIndex .* log.(SimulatedProb), dims=2)
  LL = -sum(ll_i)
end

struct input 
  y
  x
  R
end

Random.seed!(1)
N = 1000
Beta = 0.5*ones(4)
income = rand(N, 1)
x      = [ones(N, 1) income] 
omega = [1 0.5;0.5 1];
y = SimulateMNProbit(x, Beta, omega)
R = 100
p = input(y, x, R)
u =  0.5*ones(5)

LL = MNProbitSimulatedLL(Beta, y, x, R)

lb = [-10;-10;-10;-10;-1.0]
ub = [10;10;10;10;1.0]

optf = OptimizationFunction(MNProbitSimulatedLLu, Optimization.AutoForwardDiff())
prob = OptimizationProblem(optf, u, p, lb = lb, ub = ub)

sol = solve(prob, NelderMead())

The above code does not converge at all but does seem to be written correctly in that if I input theoretically optimal values the log likelihood gives the lowest possible number compared to any other input

If I am unable to access a gradient based approach I would want some general tips on what to do with a locally flat function in Julia. In MATLAB the DiffMinChange parameter in fmincon can be adjusted but I don’t know if something like that is possible in Julia

Just by skimming it seems you are doing statistical model fitting? In which case recast your problem as a Bayesian inference and sample from the posterior. If it’s locally flat you will get a lot of samples from a wide range and this will better reflect the reality of your problem than finding one particular point estimate

1 Like