@jw3126 @tlienart
This looks really cool.
In the example above only the conditional expectation E[Y|X]
changes w/ X.
This is assuming
- homoskedasticity
Var[Y|X]
is constant.
- the conditional distribution
Y|X
is normal (due to GLM)
In reality that’s usually not likely.
I tried this w/ the Boston housing data & got a constant skedastic
function
X, y = @load_boston;
model = @load LinearRegressor pkg = GLM
mach = machine(model, X, y)
fit!(mach)
y_hat = predict(mach)
Distributions.Normal{Float64}(μ=30.212372064783466, σ=4.787101274343532)
Distributions.Normal{Float64}(μ=25.267233800068816, σ=4.787101274343532)
Distributions.Normal{Float64}(μ=30.849358585028362, σ=4.787101274343532)
Assuming you have enough data (you’re gonna need a lot), is it possible to use MLJ to get heteroskedastic predictions?
This is particularly useful in finance/insurance where users care alot more about σ
then μ
.
Is it possible to get predictions which are not always normal? Either best parametric fit (from some set of models, or nonparametric)
(I realize we’re prob gonna have to depart from linear models…)
For example suppose X is 1-dimensional:
For X ∈ (1.0,1.05), Y|X ∼ Distributions.Normal{Float64}(μ=3.3, σ=4.8)
For X ∈ (2.2,2.35), Y|X ∼ Distributions.Normal{Float64}(μ=2.3, σ=1.6)
For X ∈ (2.8,3.10), Y|X ∼ Distributions.LogNormal{Float64}(μ=5.0, σ=3.6)
For X ∈ (3.5,3.72), Y|X ∼ NonParam{Float64}(μ̂ =4.0, σ̂ =3.3)
In grad school it is very routine to estimate Var[b|X]
assuming heteroskedasticity using robust EHW standard errors, or clustered se, (or FGLS w/ ML methods), then we can infer E[Y|X]
. But only for linear models.
Is there a literature on this I’m not aware of? @Tamas_Papp @fipelle
I’d ultimately like to produce something like @mthelm85’s elegant figure:
update 1: @fipelle suggests Adrian et al 2019. (code in Matlab)
Figure 1: One-year-ahead predictive distribution of real GDP growth, based on quantile regressions with current real GDP growth and NFCI as conditioning variables
A shortcoming of this method is that it only produces unimodal predictions…
Technically this is doable in MLJ which has options for quantile regression.
In addition, the authors fit a skewed t -distribution (Azzalini & Capitanio 2003) to smooth the quantile function and recover a probability density function.
Azzalini has an R package for the skew-normal/t-distribution.
The skewed-t is currently not in Distributions.jl (@simonbyrne @johnmyleswhite @andreasnoack)
Update 2: it looks like @oxinabox’s DensityEstimationML.jl can be helpful as well but I don’t see any concise examples right now.
Update 3: an emerging literature extends boosting to probabilistic forecasting.
XGBoostLSS, CatBoostLSS, Gamlss, GamboostLSS, bamlss, disttree, ngboost.
Beautiful: Slides & Docs & Guide
XGBoostLSS Paper:
“The ultimate goal of regression analysis is to obtain information about the [entire] conditional distribution of a response given a set of explanatory variables.” (Hothorn et al., 2014, emphasis added)
Julia can really shine here!