Here is the GLM interface to MLJ: https://github.com/alan-turing-institute/MLJModels.jl/blob/master/src/GLM.jl
I cannot find p-values anywhere.
However they do have:
glm_report(fitresult) = ( deviance = GLM.deviance(fitresult),
dof_residual = GLM.dof_residual(fitresult),
stderror = GLM.stderror(fitresult),
vcov = GLM.vcov(fitresult) )
Accessed by report(mach)
julia> report(mach)
(deviance = 8502.480903578471,
dof_residual = 341.0,
stderror = [0.036681269553499325, 0.01737166925431697, 0.07680917833638921, 4.81712249226012, 0.5341500317712217, 0.016323170996032966, 0.2541450592739458, 0.0841378254119214, 0.004680968995291761, 0.16998443628436907, 0.0033963935217423864, 0.0638108519702615, 6.360054287151586],
vcov = [0.0013455155360564766 -6.088746040570593e-5 β¦ -0.00039008348465676903 -0.008230320370078825; -6.088746040570593e-5 0.0003017748926813815 β¦ -8.860618288494182e-5 -0.0014983861144144224; β¦ ; -0.00039008348465676903 -8.860618288494182e-5 β¦ 0.004071824829170625 -0.13449472260664377; -0.008230320370078825 -0.0014983861144144224 β¦ -0.13449472260664377 40.450290535515265],)
This can be used to obtain t-stats:t_{\hat{\beta}}=\frac{\hat{\beta}-\beta_{0}}{ \hat{s.e.}\left(\hat{\beta}\right) } and then p-values =Pr\left(>|t|\right) under H_0.
A few observations:
- it doesnβt look like the current MLJ-GLM interface allows for p-values directly
glm_report
standard errors currently assume the covariance of the residual matrix is spherical \Omega =\sigma^2 I_n.
If youβre only interested in out of sample prediction, you donβt care about this.
If you are doing stat inference & wanna submit a paper to a science journal you need more flexibility.
Iβve never seen a recent finance/econ paper that assumes \Omega =\sigma^2 I_n
- the only honest solution is for
MLJ.jl
to add a fit_skedastic!(mach, type)
function, where type
is the type of standard error the user wants to compute depending on her assumptions about the unknown \Omega.
It can work as follows:
#step 1 fit a linear model w/ your favorite MLJ linear regression
julia> load("LinearRegressor", pkg="MLJLinearModels");
julia> m= LinearRegressor();
julia> mach = machine(m, X, y);
julia> fit!(mach)
#step 2 use residuals from fit model to estimate the skedastic function
julia> SE1 = fit_skedastic(mach, spherical) #usual naive standard errors
julia> SE2 = fit_skedastic(mach, EHW) #EickerHuberWhite standard errors
julia> SE3 = fit_skedastic(mach, cluster(state,year)) #se clustered by state-year
MLJ.jl
can actually be a great place bc recently there has been a lot of interest in using ML methods to estimate skedastic functions for better statistical inference.
This could be super-helpful to the research community (myself included) bc currently computing multiple standard errors is a royal pain. Journals often wanna see robust standard errors computed in different ways under various assumptions about the unknown data generating process. In a recent paper I computed standard errors using 11 different methods