# Replicating ologit results from Econometrics.jl

I’m having a hard time replicating the results of an ordered logit regression in `Econometrics.jl`. The code for the estimation is done here. The objective function is here.

``````using ComponentArrays
using UnPack
using Econometrics
using Distributions
using Optim
using CategoricalArrays
using DataFrames
using CSV

datadf.rep77 = recode(datadf.rep77, "Poor" => 1, "Fair" => 2, "Average" => 3, "Good" => 4, "Excellent" => 5)

# Econometrics.jl results
ecn = fit(EconometricModel, @formula(rep77 ~ foreign + length + mpg), datadfnm)

# my implementation
θ = ComponentArray{Float64}(β=[0.1, 0.1, 0.1], K=(1:4)./4)

function Λ(x)
cdf(Logistic(), x)
end

function mycumsum(K)
n = length(K)
s = similar(K)
s[1] = K[1]
for i in 2:n
s[i] = abs(K[i]) + s[i-1]
end
return s
end

function Pi(j, θ, x)
@unpack β, K = θ
KK = mycumsum(K)
kk = length(K)
n = length(x)
xb = sum(x[i]*β[i] for i in 1:n)
if 2 <= j <= kk
return Λ(KK[j] - xb) - Λ(KK[j-1] - xb)
elseif j == 1
return Λ(KK[1] - xb)
elseif j == kk+1
return 1 - Λ(KK[kk] - xb)
end
end

function boole(x, pred)
x == pred
end

function logℒ(θ; Yvec=Yvec, Xmat=Xmat)
lev = unique(Yvec)
s = zero(eltype(θ))
for i in 1:size(Xmat, 1), j in 1:length(lev)
xi = @view(Xmat[i, :])
s += log(Pi(j, θ, xi)) * boole(Yvec[i], lev[j])
end
return s
end

Xmat = Matrix(datadfnm[!, [:foreign, :length, :mpg]])

θ0 = similar(θ)
θ0.β .= 0.0
θ0.K .= log.(1:4)

opt = optimize(θ -> -logℒ(θ), θ, Newton(), Optim.Options(g_tol = 1e-10, iterations=10_000); autodiff=:forward)

# my coefficients
opt.minimizer.β
# my thresholds
opt.minimizer.K |> mycumsum

# Econometrics.jl coefficients and thresholds:
ecn.β
``````

Can any one give some advice?

I think your best bet is to summon @Nosferican. In these types of situation I usually use RCall to try and get another opinion as well.

1 Like

I have been summoned.
Did you take a look at the test suite? The results are verified against Stata 13 `ologit` (should be the same or quite close to Stata 15) and R’s `MASS:polr`. Did you check the your implementation against other software?

3 Likes

Stata 16.1 and `Econometrics.jl` output the same results. I still don’t see how my implementation is different. Where in the algorithm do I do something fundamentally different than `Econometrics.jl`.

I need to run a ologit many times, hence why I’m trying to implement a barebones algorithm.

1 Like

The easiest implementation to “borrow from” would be

You can adapt the “barebones” routine and re-use arrays and scratch objects as needed for a speed-up as applicable. If you have some guarantees such that you don’t have to check for linearly separable cases or perfect multicollinearity or other checks, you could get rid of those to make it less “safe” but potentially faster. It depends on the specifics of your use case.

2 Likes

I finally figured it out!
The problem was that `unique(Yvec)` does not yield a sorted vector. Once I changed the line `lev = unique(Yvec)` to `lev = levels(Yvec)` I got the same result as `Econometrics.jl` (and Stata).

Aside, I use `abs` to compute the increasing thresholds, while @nosferican you use `exp`. For values away from 0, it should be fine I think. I have used `abs` in other cases to ensure that some quantity remains positive, but is there any concern in using `abs` vs `exp`?

1 Like

That, I recall, the `exp` method is definitely better. I think that was a numerical trick I learned from the MASS authors.

2 Likes