By the way, here’s your code, but also using the ML code I referenced above.
using Distributions, Optim, NLSolversBase, LinearAlgebra, DataFrames, GLM, Random
#Data Generating Process
Random.seed!(123)
x = rand(Normal(),5000)
epsilon = rand(Normal(),5000)
y = 1 .+ 2.5 .* x + epsilon .> 0
#Probit Function
function probit(theta, data)
res = data[:,1:end-1]*theta
q = 2 .* data[:,end] .- 1
ll = cdf.(Normal(),q .* res)
LL = -sum(log.(ll))
end
function probit2(theta, data)
res = data[:,1:end-1]*theta
q = 2 .* data[:,end] .- 1
ll = cdf.(Normal(),q .* res)
LL = log.(ll)
end
#data and initial guess
thetap = [.5; 1.5]
datap = [ones(5000) x y]
#setup optimization
prob = TwiceDifferentiable(vars -> probit(vars, datap), thetap; autodiff = :forward)
prest = optimize(prob, thetap, Newton(), Optim.Options(show_trace=true))
#correct
a = sqrt.(diag(inv(hessian!(prob,Optim.minimizer(prest)))))
display(a)
#no idea. I get a noninvertible matrix from the product of gradients.
gra = gradient!(prob,Optim.minimizer(prest))
#display(gra*gra')
#b = sqrt.(diag(inv(gra*gra')))
#check GLM package
df = DataFrame(X =x, Y=y)
GLMprobit = glm(@formula(Y ~ X), df, Binomial(), ProbitLink())
display(GLMprobit)
# check mleresults()
m = theta-> probit2(theta, datap)
mleresults(m, thetap, vc=1)
mleresults(m, thetap, vc=2)
mleresults(m, thetap, vc=3)
Running this gives the following output. It appears that GLM must be using the inverse Hessian form for computing the covariance. The slight difference in results must be due to difference convergence tolerances, different ways in which Hessians are computed, etc. But, the numbers are the same for all practical purposes.
Note that the key to estimating the information matrix is to have the model available without the summing or averaging (the probit2 function).
julia> include("ML.jl");
Iter Function value Gradient norm
0 1.424836e+03 3.508441e+02
1 1.272355e+03 1.255745e+02
2 1.261903e+03 4.520994e+00
3 1.261890e+03 4.146793e-03
4 1.261890e+03 6.025036e-09
2-element Array{Float64,1}:
0.036472878945357355
0.06771710417344215
StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Y ~ 1 + X
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 1.00567 0.0364882 27.5614 <1e-99
X 2.42153 0.0674308 35.9113 <1e-99
************************************************************
MLE Estimation Results Convergence: true
Average Log-L: -0.25238 Observations: 5000
Sandwich form covariance estimator
estimate st. err t-stat p-value
1 1.00567 0.03648 27.56426 0.00000
2 2.42153 0.06568 36.87032 0.00000
Information Criteria
Crit. Crit/n
CAIC 2542.81478 0.50856
BIC 2540.81478 0.50816
AIC 2527.78039 0.50556
************************************************************
************************************************************
MLE Estimation Results Convergence: true
Average Log-L: -0.25238 Observations: 5000
Inverse Hessian form covariance estimator
estimate st. err t-stat p-value
1 1.00567 0.03647 27.57314 0.00000
2 2.42153 0.06772 35.75961 0.00000
Information Criteria
Crit. Crit/n
CAIC 2542.81478 0.50856
BIC 2540.81478 0.50816
AIC 2527.78039 0.50556
************************************************************
************************************************************
MLE Estimation Results Convergence: true
Average Log-L: -0.25238 Observations: 5000
OPG form covariance estimator
estimate st. err t-stat p-value
1 1.00567 0.03647 27.57348 0.00000
2 2.42153 0.06984 34.67112 0.00000
Information Criteria
Crit. Crit/n
CAIC 2542.81478 0.50856
BIC 2540.81478 0.50816
AIC 2527.78039 0.50556
************************************************************
julia>