# How to compute hessian of simulated mle

I need to estimate parameters (a,b) and their corresponding standard errors using simulated MLE. I can successfully estimate the parameters, but I fail to get the SEs. In particular, when I try to obtain the hessian from the optimization using automatic differentiation, it gives me a zero matrix. Since the SE is the square root of the inverse of the hessian, I am unable to compute the SEs using this method. Can you please let me know if there is something wrong with how I am calling the hessian?

I first present the algorithm in words, I then present the implementation in Julia.

The algorithm for the SMLE is as follows:

1. We are given data for 10 types of individuals i for some share s^{data}_i and some variable Y_i. We need the relationship between s^{data}_i and Y_i. To do so we execute the following:

2. For each type of individual i, simulate Y_{ij}, u_{ij} for 10,000 individuals j as follows. For each j, draw (i) Y_{ij} according to some pre-determined rule, and (ii) make a uniform draw u_{i,j} between 0 and 1.

3. Construct logit probabilities for each i,j as follows p_{i,j} = \frac{\exp (a + b\cdot Y_{i,j})}{1+\exp (a + b\cdot Y_{i,j})}

4. Assign s_{i,j}=1 if p_{i,j} \geq u_{i,j}, set s_{i,j} = 0 otherwise

5. Construct the simulated likelihood contribution for type i as l_i = \frac{1}{10000}\sum_j s_{i,j}

6. Construct the overall log likelihood as the weighted sum L = \sum_i \ s^{data}_i \times \ln l_i + (1-s^{data}_i )\ln(1-l_i)

7. Search for the parameters (a,b) which minimize the simulated likelihood and obtain their standard errors. I can estimate the parameters in Julia. However the SEs are undefined because Julia returns a zero matrix for the hessian which cannot be inverted.

Below is the implementation in Julia.

using Random, LinearAlgebra, Distributions, Optim, StatsBase
using Optim: converged, maximum, maximizer, minimizer, iterations
using DelimitedFiles
using Statistics
using SharedArrays
using Optim, NLSolversBase, Random
using LinearAlgebra: diag

Random.seed!(1234)

## define fake data and make some simulations for the estimation
draws_uniform = rand(Uniform(0,1), 10, 10000)
draws_y = cumsum(ones(Float64, 10)) .+ rand(Uniform(1,2), 10, 10000)

share_hasgood = [.1 .2 .3 .4 .5 .6 .7 .8 .9 .9]'

## construct share by (i) comparing logit probability p to draws_uniform, and (ii) summing over all columns with p > draws_uniform
function model_derived_share(t, draws_uniform, draws_y)
a = t[1]
b = t[2]

p = exp.(a .+ b .* draws_y) ./ (1 .+ exp.(a .+ b .* draws_y)) # probability indiv has good
s = p .> draws_uniform	# assign indiv as having or not having good by comparing model probability with uniform draw

return mean(s, dims=2)
end

# construct log likelihood by taking weighted sum of model-derived shares
loglike(model_prediction, actual_share) = -sum(log.(model_prediction) .* actual_share .+ (1 .- actual_share) .* log.(1 .- model_prediction))

# optimize by minimizing negative of the log likelihood
using Optim
params  = Optim.minimizer(optimize((t) -> loglike(model_derived_share(t, draws_uniform, draws_income), share_hasgood), [.5; .5]))
display(params)

# evaluate hessian of loglikelihood at optimum params = [a*;b*]
functwicediff = TwiceDifferentiable(t -> momentcond(model_derived_share(t, draws_uniform, draws_y), share_hasgood), [.5; .5]; autodiff=:forward)
numerical_hessian = hessian!(functwicediff, params)
display(numerical_hessian)

# compute standard error of parameters as square root of the inverse of the hessian
std_err = sqrt.(diag(inv(numerical_hessian)))


You can make model_derived_share a smooth function of t by getting rid of draws_uniform and s and just returning mean(p, dims=2).

If my understanding of the model is correct, your log likelihood isn’t quite right. It’s missing the contributions from people who chose the outside option. I think the log likelihood should be

L = \sum_i ( s^{data}_i \times \ln l_i + (1-s^{data}_i)\times \ln (1-l_i))

Good catch, you are correct that the likelihood should include the outside option. I will edit the question above with this.

However, taking the mean won’t work for my application. I need to assign s_{ij} \in \{0,1\} according to s = p .> draws_uniform because s_{ij} is meant to be an indicator for whether a good is in the choice set of (i,j). I did not mention this in MWE because this complicates things. The algorithm above tells me whether one good belongs to the choice set of (i,j). In practice, I do this for many goods to obtain the set of all goods that belong to the choice set of each simulated individual. I then estimate the discrete choice problem using these simulated choice sets to define the inclusive value. I do not produce all of this detail as it complicates things.

I am mainly wondering how to use auto diff to get the hessian for the choice set parameters above, and if that is not possible, if there is an alternative to estimate the SEs. For what its worth, Julia returns non-zero second derivatives for the preference parameters which I guess is expected since those parameters do enter the log likelihood smoothly.

The SML likelihood function for this model is discontinuous, so it is non-differentiable. There are approaches which approximate the step function with a kernel smoothed function, to make it differentiable. See McFadden’s 1989 Econometrica article, for example.

An alternative is to use Bayesian methods, following Chernozhukov and Hong, 2003. This is practical and relatively straightforward. An example for MSM is
Econometrics/22-SimulationBased.jl at main · mcreel/Econometrics · GitHub

Doing the same for simulated likelihood would not be too difficult. One would need to change the likelihood in the Turing model part following CH2003.

One other note: because the log likelihood is discontinuous, a derivative-based optimization routine is likely to converge to a local extremum. One could use a derivative free optimizer, but you would still have the differentiability problem for getting standard errors. The Bayesian version gets around both of those problems.

Thank you @mcreel . This is great. I will take a look at the references.

The algorithm I am working with is similar to Goeree (2008) in that their paper also involves simulating the choice set by comparing a logit probability to a uniform draw (https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA4158). Page 1067 in the appendix outlines their algorithm.

They mention simulating the information matrix to obtain the SEs. Is this similar to either of the methods you cited above?

Yes I am using Nelder-Mead which doesn’t use derivatives so that would seem to resolve the issue of local extrema.

That paper states, on page 1038, “Given the simulated choice set, I compute choice probabilities for each individual for each product and construct an importance sampler to smooth the simulated choice probabilities.” With this, the moment conditions are differentiable, so the standard extremum estimation GMM methods can be applied to get standard errors. This smoothing idea is one that started with McFadden, as far as I know. If you implement smoothing, then I think that automatic differentiation will work just fine to get the Hessian.