# Regression: fixed effects + quantiles

I am trying to run a quantile regression with fixed effects. Does anyone know if a package for this exists?

Thanks!

1 Like

We have one package for fixed effects:

A separate for quantile regression:

Currently no package that automates both together.

I would create fixed effects by hand & then run a quantile regression.

3 Likes

Yeah, that’s what I was doing. I guess better than nothing. Thank you!

btw MLJLinearModels also supports QuantileRegression an is generally faster than the package cited above (actually it’s also faster than the specialised R package ) that being said it’s probably more barebones, depends what you want.

3 Likes

Do you have the benchmarks against quantreg? I’m curious about the design matrices you used.

Without using a sparse designmatrix that seems very expensive In what sense are you including a fixed effect here? How are you keeping it constant across fixed effects in QuantileRegressions.jl?

1 Like

I would create fixed effects by hand & then run a quantile regression.

is a good way forward.
The first part is just creating the usual (ystar,xstar) by individual de-meaning. Then run a quantile regression on that. For instance, here is a working code

``````"""
QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)

Quantile regression by using an IRLS algorithm.

Estimate a quantile regression, quantile q. The outputs are the point estimates
and three different variance-covariance matrices of the estimates.

# Input
- `y::Vector`:     T vector, dependent variable
- `x::Matrix`:     TXK, regressors (including any constant)
- `q::Number`:     quantile to estimate at, 0<q<1
- `prec::Float64`: convergence criterion, 1e-8
- `epsu::Float64`: lower bound on 1/weight, 1e-6
- `maxiter::Int`:  maximum number of iterations, 1000

# Output
- `theta::Vector`: K vector, estimated coefficients
- `vcv::Matrix`:   KxK, traditional covariance matrix
- `vcv2::Matrix`:  KxK, Powell (1991) covariance matrix
- `vcv3::Matrix`:  KxK, Powell (1991) covariance matrix, uniform

"""
function QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)

(T,K) = size(x)
xw    = copy(x)

(b_old,b,u,iter) = (zeros(K),fill(1e+6,K) .+ prec,zeros(T),0)

while maximum(abs,b - b_old) > prec
copyto!(b_old, b)
b  .= (xw'*x)\(xw'*y)
u  .= y - x*b
#u  .= ifelse.(u.>0,1-q,q).*abs.(u)   #as in Python code, divide loss fn by q(1-q) to get it
u  .= ifelse.(u.>0,1/q,1/(1-q)).*abs.(u)   #abs(u)/q if u>0, abs(u)/(1-q) if u<0
u  .= max.(u,epsu)                         #not smaller than epsu
xw .= x./u
iter = iter + 1
if iter > maxiter
@warn("\$iter > maxiter")
b = NaN
break
end
end

res = y - x*b

D   = x'x/T
h   = 1.06*std(res)/T^0.2                        #Silverman (1986) recommendation
fx  = exp.(-0.5*((res/h).^2))./(h*sqrt(2*pi))    #Green 7th ed, using N() kernel
f0  = mean(fx)
C   = f0*x'x/T
C_1 = inv(C)
vcv = q*(1-q)*C_1*D*C_1/T                         #variance-covariance matrix

C    = (fx.*x)'x/T                                #Wooldrige 2dn ed, Powell 1991
C_1  = inv(C)                                     #but with Gaussian kernel
vcv2 = q*(1-q)*C_1*D*C_1/T                        #caputures (x,res) dependence

fx  = (abs.(res) .<= h/2)/h                       #Wooldrige 2nd ed, Powell 1991
C   = (fx.*x)'x/T                                 #uniform kernel
C_1 = inv(C)
vcv3 = q*(1-q)*C_1*D*C_1/T

return b, vcv, vcv2, vcv3

end
``````
3 Likes

I first modify the data by manually including FEs.
Then I write the formula corresponding to the regression with FE.
This is included here:

``````#this function runs a quantile regression with fixed effects
function QregFE(Xdata::DataFrame,dep_var::String,indep_vars::Array{String,1},FEs::Array{String,1},icon=true)
Xdata = copy(Xdata);
nr = size(Xdata,1);
indep_vars
#including FE
for ife = 1:length(FEs)
fe_list = sort(unique(Xdata[!,Symbol(FEs[ife])]));
nFE = length(fe_list);
for i=1:nFE-1
Xdata[!,Symbol(FEs[ife]*"_"*string(i))] = Bool.(zeros(nr));
Xdata[findall(Xdata[!,Symbol(FEs[ife])].==fe_list[i]),Symbol(FEs[ife]*"_"*string(i))] .=true;
indep_vars = vcat(indep_vars,FEs[ife]*"_"*string(i));
end
end
#creating formula using the strings
reg_string = "@formula(";
reg_string = reg_string*" "*dep_var*" ~ ";
if icon; reg_string = reg_string*" 1 + "; end;
reg_string = reg_string*join(indep_vars, '+');
reg_string = reg_string*")";
fm = eval(Meta.parse(reg_string));
return fm, Xdata;
end
``````

Finally, I use your function `qreg`. This is my approach.

1 Like

The formula construction part is much more complicated than it needs to be. You can construct a formula programatically pretty easily.

https://juliastats.org/StatsModels.jl/stable/formula/#Constructing-a-formula-programmatically-1

2 Likes

I used the stuff that’s used in the R package quantreg; but I’d be interested in building a more general test suite for generalised regression; btw speed is usually not a key element for that kind of stuff unless people have “big” data but iirc your package makes use of RWLS (I’m typing on my phone here and may be remembering wrong!) whereas the package I linked to offers a few solvers, some which can be faster.

Anyway this is maybe for another discussion; I’m generally interested in building a generalised regression toolbox in Julia that makes good use of tools that are already available like Optim and IterativeSolvers and can offer all the standard penalised regressions out there that would beat standard regression packages in R and Matlab etc. MLJLM is one (reasonably serious) attempt at that but I’d be happy to do something with more people. (GLMs is a bit too constrained in its design so it would have to be something else)

There were a few other attempts like the defunct Regressions.jl and mostly defunct SparseRegressions and there’s probably others out there…

3 Likes

it should default to the interior point solver but does also have iteratively reweighted least squares. The first hopefully matches quantregs default as it was programmed up according to the article (The Gaussian Hare and the Laplacian Tortoise).

Anyway yeah, it’s all about use-cases. The usecase in OP is quite performance sensitive though, and does require sparse liinear algebra to be acceptable. I know because I’ve estimated these models quite a few times with quite a few individuals. Using the penalized version of Koenker and also the plug-in estimator by (striking out on the author here).

2 Likes

So it turns out that in some case you can do a projected gradient descent and it’s significantly faster than the IP (let alone IRLS) to get to essentially the same coefs; but I’m aware these are always tricky claims to make and I’d be happy to investigate more in depth though maybe a different topic would be more appropriate 1 Like

Interesting, yeah. Happy to add it directly to QuantileRegressions.jl. You could also consider benchmarking against some of the other non-default solvers in quantreg such as Barrodal and Roberts. I don’t know of the proximal versions are in quantreg but there are some variations out there. Anyway, this is getting a bit off-topic 1 Like

My experience (& this discussion) makes me feel that Julia stats packages can work better together (be more modular). Currently if I want fixed effects (outside of FixedEffectModels.jl) I have to create them manually (by hand).

It would be great if a general package such as `StatsModels.jl` could automatically create fixed effects. Then these can be used w/ other stats packages in the ecosystem (such as `QuantileRegressions.jl`).

I’m not sure what you mean. Non-`Real` variables are automatically treated as fixed effects in `glm`.

``````julia> df = DataFrame(income = rand(1000), edu = rand(1000), state = string.(rand(1:50,1000)));

julia> lm(@formula(income ~ edu + state), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

income ~ 1 + edu + state

Coefficients:
──────────────────────────────────────────────────────────────────────────────
Coef.  Std. Error      t  Pr(>|t|)   Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)   0.507432      0.0681312   7.45    <1e-12   0.373726   0.641137
edu           0.0327742     0.0324312   1.01    0.3125  -0.0308709  0.0964193
``````
1 Like

@pdeffebach
Imagine I have a numeric code for `state` (like a FIPS code).
Compare: GLM.jl vs QuantileRegressions.jl vs FixedEffectModels.jl

``````using DataFrames, GLM, QuantileRegressions, FixedEffectModels;
N=1000;
df=DataFrame(income = rand(N), edu = rand(N), state = rand(1:50,N));
f1=@formula(income ~ edu + state)
f2=@formula(income ~ edu + fe(state))
#
a1 = GLM.lm(f1, df)
b1 = qreg(f1, df, .5)
c1 = reg(df, f1)
#
a2 = GLM.lm(f2, df)    #Error
b2 = qreg(f2, df, .5)  #Error
c2 = reg(df, f2)
``````

Formula `f1` treats `state` as a continuous variable & works w/ all packages that accept such formulae.

Formula `f2` treats `state` as a fixed effect, and only works with FixedEffectModels.jl but not with GLM.jl & QuantileRegressions.jl.
It would be nice if `fe()` was in `StatsModels.jl` (or somewhere) allowing fe to automatically work w/ all formulas.

Similarly, @matthieu’s Vcov.jl currently only works w/ FixedEffectModels.jl but not yet with GLM.jl etc.

``````using Vcov
GLM.lm(f1, df, Vcov.cluster(:state))  #Error
reg(df, f1, Vcov.cluster(:state))     #no Error ;)
``````

I hope we can increase modularity in the Julia stats ecosystem…

Just to be clear, in the trivial example above formula `f2` can be easily done by hand.
My note is more generally about putting certain features that many different users want like fe into a more general packages available to everybody.
In STATA: `reg y F(-1 0 1 2).D i.x1#(c.x2 i.x3)` works across methods

I absolutely think we need a better way of using fixed effects in GLM and statsmodels mode generally. I think either `fe(x)` or `categorical(x)` or something should be used to refer to fixed effects. For reference, the way to do this in StatsModels currently is

``````julia> lm(@formula(income ~ edu + state), df, contrasts = Dict(:state => EffectsCoding()))
``````

Also, you have a typo in your post

``````reg(df, f1, Vcov.cluster(:state))     #Error
``````

This works fine, does not error.

I definitely think that

1. StatsModels should support `fe`
2. FixedEffectModels should support `glm` and the general ecosystem better.
2 Likes