I am trying to run a quantile regression with fixed effects. Does anyone know if a package for this exists?
Thanks!
I am trying to run a quantile regression with fixed effects. Does anyone know if a package for this exists?
Thanks!
We have one package for fixed effects:
https://github.com/FixedEffects
A separate for quantile regression:
https://github.com/pkofod/QuantileRegressions.jl
Currently no package that automates both together.
I would create fixed effects by hand & then run a quantile regression.
Yeah, thatβs what I was doing. I guess better than nothing. Thank you!
btw MLJLinearModels also supports QuantileRegression an is generally faster than the package cited above (actually itβs also faster than the specialised R package ) that being said itβs probably more barebones, depends what you want.
Do you have the benchmarks against quantreg? Iβm curious about the design matrices you used.
Without using a sparse designmatrix that seems very expensive In what sense are you including a fixed effect here? How are you keeping it constant across fixed effects in QuantileRegressions.jl?
I would create fixed effects by hand & then run a quantile regression.
is a good way forward.
The first part is just creating the usual (ystar,xstar) by individual de-meaning. Then run a quantile regression on that. For instance, here is a working code
"""
QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)
Quantile regression by using an IRLS algorithm.
Estimate a quantile regression, quantile q. The outputs are the point estimates
and three different variance-covariance matrices of the estimates.
# Input
- `y::Vector`: T vector, dependent variable
- `x::Matrix`: TXK, regressors (including any constant)
- `q::Number`: quantile to estimate at, 0<q<1
- `prec::Float64`: convergence criterion, 1e-8
- `epsu::Float64`: lower bound on 1/weight, 1e-6
- `maxiter::Int`: maximum number of iterations, 1000
# Output
- `theta::Vector`: K vector, estimated coefficients
- `vcv::Matrix`: KxK, traditional covariance matrix
- `vcv2::Matrix`: KxK, Powell (1991) covariance matrix
- `vcv3::Matrix`: KxK, Powell (1991) covariance matrix, uniform
"""
function QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)
(T,K) = size(x)
xw = copy(x)
(b_old,b,u,iter) = (zeros(K),fill(1e+6,K) .+ prec,zeros(T),0)
while maximum(abs,b - b_old) > prec
copyto!(b_old, b)
b .= (xw'*x)\(xw'*y)
u .= y - x*b
#u .= ifelse.(u.>0,1-q,q).*abs.(u) #as in Python code, divide loss fn by q(1-q) to get it
u .= ifelse.(u.>0,1/q,1/(1-q)).*abs.(u) #abs(u)/q if u>0, abs(u)/(1-q) if u<0
u .= max.(u,epsu) #not smaller than epsu
xw .= x./u
iter = iter + 1
if iter > maxiter
@warn("$iter > maxiter")
b = NaN
break
end
end
res = y - x*b
D = x'x/T
h = 1.06*std(res)/T^0.2 #Silverman (1986) recommendation
fx = exp.(-0.5*((res/h).^2))./(h*sqrt(2*pi)) #Green 7th ed, using N() kernel
f0 = mean(fx)
C = f0*x'x/T
C_1 = inv(C)
vcv = q*(1-q)*C_1*D*C_1/T #variance-covariance matrix
C = (fx.*x)'x/T #Wooldrige 2dn ed, Powell 1991
C_1 = inv(C) #but with Gaussian kernel
vcv2 = q*(1-q)*C_1*D*C_1/T #caputures (x,res) dependence
fx = (abs.(res) .<= h/2)/h #Wooldrige 2nd ed, Powell 1991
C = (fx.*x)'x/T #uniform kernel
C_1 = inv(C)
vcv3 = q*(1-q)*C_1*D*C_1/T
return b, vcv, vcv2, vcv3
end
I first modify the data by manually including FEs.
Then I write the formula corresponding to the regression with FE.
This is included here:
#this function runs a quantile regression with fixed effects
function QregFE(Xdata::DataFrame,dep_var::String,indep_vars::Array{String,1},FEs::Array{String,1},icon=true)
Xdata = copy(Xdata);
nr = size(Xdata,1);
indep_vars
#including FE
for ife = 1:length(FEs)
fe_list = sort(unique(Xdata[!,Symbol(FEs[ife])]));
nFE = length(fe_list);
for i=1:nFE-1
Xdata[!,Symbol(FEs[ife]*"_"*string(i))] = Bool.(zeros(nr));
Xdata[findall(Xdata[!,Symbol(FEs[ife])].==fe_list[i]),Symbol(FEs[ife]*"_"*string(i))] .=true;
indep_vars = vcat(indep_vars,FEs[ife]*"_"*string(i));
end
end
#creating formula using the strings
reg_string = "@formula(";
reg_string = reg_string*" "*dep_var*" ~ ";
if icon; reg_string = reg_string*" 1 + "; end;
reg_string = reg_string*join(indep_vars, '+');
reg_string = reg_string*")";
fm = eval(Meta.parse(reg_string));
return fm, Xdata;
end
Finally, I use your function qreg
. This is my approach.
The formula construction part is much more complicated than it needs to be. You can construct a formula programatically pretty easily.
https://juliastats.org/StatsModels.jl/stable/formula/#Constructing-a-formula-programmatically-1
I used the stuff thatβs used in the R package quantreg; but Iβd be interested in building a more general test suite for generalised regression; btw speed is usually not a key element for that kind of stuff unless people have βbigβ data but iirc your package makes use of RWLS (Iβm typing on my phone here and may be remembering wrong!) whereas the package I linked to offers a few solvers, some which can be faster.
Anyway this is maybe for another discussion; Iβm generally interested in building a generalised regression toolbox in Julia that makes good use of tools that are already available like Optim and IterativeSolvers and can offer all the standard penalised regressions out there that would beat standard regression packages in R and Matlab etc. MLJLM is one (reasonably serious) attempt at that but Iβd be happy to do something with more people. (GLMs is a bit too constrained in its design so it would have to be something else)
There were a few other attempts like the defunct Regressions.jl and mostly defunct SparseRegressions and thereβs probably others out thereβ¦
it should default to the interior point solver but does also have iteratively reweighted least squares. The first hopefully matches quantregs default as it was programmed up according to the article (The Gaussian Hare and the Laplacian Tortoise).
Anyway yeah, itβs all about use-cases. The usecase in OP is quite performance sensitive though, and does require sparse liinear algebra to be acceptable. I know because Iβve estimated these models quite a few times with quite a few individuals. Using the penalized version of Koenker and also the plug-in estimator by (striking out on the author here).
So it turns out that in some case you can do a projected gradient descent and itβs significantly faster than the IP (let alone IRLS) to get to essentially the same coefs; but Iβm aware these are always tricky claims to make and Iβd be happy to investigate more in depth though maybe a different topic would be more appropriate
Interesting, yeah. Happy to add it directly to QuantileRegressions.jl. You could also consider benchmarking against some of the other non-default solvers in quantreg such as Barrodal and Roberts. I donβt know of the proximal versions are in quantreg but there are some variations out there. Anyway, this is getting a bit off-topic
My experience (& this discussion) makes me feel that Julia stats packages can work better together (be more modular). Currently if I want fixed effects (outside of FixedEffectModels.jl) I have to create them manually (by hand).
It would be great if a general package such as StatsModels.jl
could automatically create fixed effects. Then these can be used w/ other stats packages in the ecosystem (such as QuantileRegressions.jl
).
Iβm not sure what you mean. Non-Real
variables are automatically treated as fixed effects in glm
.
julia> df = DataFrame(income = rand(1000), edu = rand(1000), state = string.(rand(1:50,1000)));
julia> lm(@formula(income ~ edu + state), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
income ~ 1 + edu + state
Coefficients:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
(Intercept) 0.507432 0.0681312 7.45 <1e-12 0.373726 0.641137
edu 0.0327742 0.0324312 1.01 0.3125 -0.0308709 0.0964193
@pdeffebach
Imagine I have a numeric code for state
(like a FIPS code).
Compare: GLM.jl vs QuantileRegressions.jl vs FixedEffectModels.jl
using DataFrames, GLM, QuantileRegressions, FixedEffectModels;
N=1000;
df=DataFrame(income = rand(N), edu = rand(N), state = rand(1:50,N));
f1=@formula(income ~ edu + state)
f2=@formula(income ~ edu + fe(state))
#
a1 = GLM.lm(f1, df)
b1 = qreg(f1, df, .5)
c1 = reg(df, f1)
#
a2 = GLM.lm(f2, df) #Error
b2 = qreg(f2, df, .5) #Error
c2 = reg(df, f2)
Formula f1
treats state
as a continuous variable & works w/ all packages that accept such formulae.
Formula f2
treats state
as a fixed effect, and only works with FixedEffectModels.jl but not with GLM.jl & QuantileRegressions.jl.
It would be nice if fe()
was in StatsModels.jl
(or somewhere) allowing fe to automatically work w/ all formulas.
Similarly, @matthieuβs Vcov.jl currently only works w/ FixedEffectModels.jl but not yet with GLM.jl etc.
using Vcov
GLM.lm(f1, df, Vcov.cluster(:state)) #Error
reg(df, f1, Vcov.cluster(:state)) #no Error ;)
I hope we can increase modularity in the Julia stats ecosystemβ¦
Just to be clear, in the trivial example above formula f2
can be easily done by hand.
My note is more generally about putting certain features that many different users want like fe into a more general packages available to everybody.
In STATA: reg y F(-1 0 1 2).D i.x1#(c.x2 i.x3)
works across methods
I absolutely think we need a better way of using fixed effects in GLM and statsmodels mode generally. I think either fe(x)
or categorical(x)
or something should be used to refer to fixed effects. For reference, the way to do this in StatsModels currently is
julia> lm(@formula(income ~ edu + state), df, contrasts = Dict(:state => EffectsCoding()))
Also, you have a typo in your post
reg(df, f1, Vcov.cluster(:state)) #Error
This works fine, does not error.
I definitely think that
fe
glm
and the general ecosystem better.