Classical Hypothesis tests: Wald, LR, LM

I was able to code up the LR & Wald tests.
I need help w/ the score test. Does anyone know how to obtain the loglik functions after glm?

using DataFrames, GLM, Distributions

n=7; k=1; X=randn(n,k); β=[2 ;-4]; σ_e = 10.1;
Y= [ones(n) X]*β + σ_e*randn(n);

d = DataFrame(X1= X[:,1], Y=Y);
#d = DataFrame(X1= X[:,1], X2= X[:,2], Y=Y);
f0 = @formula(Y ~ 1)
fA = @formula(Y ~ 1 + X1)
mH0 = lm(f0, d)
mHA = lm(fA, d)
mH0 = glm(f0, d, Normal())
mHA = glm(fA, d, Normal())

# Likelihood Ratio Test
function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)
    return LR, pv, df
HT_LR(mH0, mHA) #better practice
HT_LR(mHA, mH0)
# Wald Test
"H0: R*θ = r"
"HA: R*θ ≂̸ r"
function HT_Wald(mHA, R, r, V)
    θ̂ = coef(mHA)
    A = (R*θ̂ - r)
    #V = vcov(mHA)  #[2,2]
    n = size(,1)
    W = A' * inv(R*V*R') * A   #V/n
    df = size(R,1)
    pv = 1 - cdf(Chisq(df), W)
    return W, pv, df
R = [0 1]
r = [0]
HT_Wald(mHA, R, r, vcov(mHA))

"LM aka Score test Matlab code"
G = gradp(@nll_lin,theta_r_full,datamat,1);
H_r = HessMp(@nll_lin,theta_r_full,datamat,1);
V_r = inv(H_r);
LM = G*V_r*G';
LM_p = 1-chi2cdf(LM,2);
1 Like

You might find the code in useful, it has the score test for the linear regression model with classical assumptions.

julia> TestStatistics();
│       │        Value │      p-value │
│    qF │      0.93966 │      0.29772 │
│  Wald │      1.04407 │      0.33153 │
│    LR │      1.02631 │      0.32581 │
│ Score │      1.00895 │      0.32021 │



I think you want something like loglik_obs here?

Also StatsModels has something called HypothesisCoding for a matrix. Not sure what it means but seems like it could be useful.


@mcreel thanks for the link.
My understanding is that Wald/LR/LM tests are asymptotically equivalent but:
Finite sample statistics: Wald > LR > LM (as in your table)
Finite sample p-values: Wald < LR < LM (opposite of your table)
I think your table can be fixed by
replacing: pvalues = chisqccdf.(tests,q)
with: pvalues = ccdf.(Chisq(q), tests)

I think you’re goal is to write out the entire likelihood for educational purposes.
The LRT can work w/ any Julia object w/ loglikelihood(m)

function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)  # ccdf(Chisq(df), LR)
    return LR, pv, df

Thanks! You’re right about the p-values. I have fixed that.

1 Like

@mcreel should your qF statistic be distributed FDist(q, n-k) under the H0?
(not sure about this)

Bruce Hansen’s Econometrics p 240 discusses this. Basically your code is correct.

W(\theta) \sim_{\text{asymptotically}} \chi^2_q
The F version of the test: F = W/q
It’s conventional to use F_{q,n-k} p-values instead of \chi^2_q.

While there is no formal justification to using the F_{q,n-k} distribution for non-homoskedastic covariance matrices, the F_{q,n-k} distribution provides continuity with the exact distribution theory under normality and is a bit more conservative than the \chi^2_q distribution. (Furthermore, the difference is small when n−k is moderately large.)

A few comments.

Depending on the structure of the covariance matrix it may be an F-test or a Wald-test.

The hypotheses and restrictions need to take into account the contrasts specified in the model.

The residual degrees of freedom for the test under the null hypothesis should come from the model
dof_residual and there are various options. For example, ridge models would use effective degrees of freedom in the computation and linear mixed models can choose any of several approximations (there isn’t a “correct way” just flavored approximations which are more or less suitable depending on various factors).

For models that are not maximum likelihood (e.g. restricted maximum likelihood) there are a number of caveats such as no true LR-test and you can use the “on the deviance scale” metrics but requires the same structure of fixed effects.

The more general approach is the restrictions based on linear combinations.

The main difference between LR test and Wald is that the Wald test allows for testing any number of linear combinations and hypothesis tests including incorporating the information from the variance covariance estimates all with a single fitted model. The likelihood ratio test is more powerful but requires re-fitting the model for each combination.

1 Like

@pdeffebach thanks. I tried that yesterday but gave up.
Just tried again. Close, but no cigar (yet):

using Zygote
function AZll(m, β)
    r   = m.model.rr
    y   = r.y
    ll  = 0.0  #zero(eltype(mu))
    ϕ = deviance(m)/length(y)
    x = m.model.pp.X
    @inbounds for i in eachindex(y)
        ll += logpdf(Normal(x[i,:]'*β, sqrt(ϕ)), y[i])
AZll(mHA, coef(mHA))
AZ(β) = AZll(mHA, β)

AZ([coef(mH0); 0.0])
Sc= Zygote.forward_jacobian(AZ, [coef(mH0); 0.0])[2]
FI= -Zygote.hessian(AZ, [coef(mH0); 0.0])
LM = (Sc' * inv(FI) * Sc)[1]
LM ∼ Χ²(q)

Can you also try Distribtions.gradloglikpdf?

1 Like