Regression: fixed effects + quantiles

I am trying to run a quantile regression with fixed effects. Does anyone know if a package for this exists?

Thanks!

1 Like

We have one package for fixed effects:
https://github.com/FixedEffects

A separate for quantile regression:
https://github.com/pkofod/QuantileRegressions.jl

Currently no package that automates both together.

I would create fixed effects by hand & then run a quantile regression.

@pkofod
@matthieu

3 Likes

Yeah, that’s what I was doing. I guess better than nothing. Thank you!

btw MLJLinearModels also supports QuantileRegression an is generally faster than the package cited above (actually it’s also faster than the specialised R package :man_shrugging:) that being said it’s probably more barebones, depends what you want.

3 Likes

Do you have the benchmarks against quantreg? I’m curious about the design matrices you used.

Without using a sparse designmatrix that seems very expensive :grimacing: In what sense are you including a fixed effect here? How are you keeping it constant across fixed effects in QuantileRegressions.jl?

1 Like

I would create fixed effects by hand & then run a quantile regression.

is a good way forward.
The first part is just creating the usual (ystar,xstar) by individual de-meaning. Then run a quantile regression on that. For instance, here is a working code

"""
    QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)

Quantile regression by using an IRLS algorithm.


Estimate a quantile regression, quantile q. The outputs are the point estimates
and three different variance-covariance matrices of the estimates.

# Input
- `y::Vector`:     T vector, dependent variable
- `x::Matrix`:     TXK, regressors (including any constant)
- `q::Number`:     quantile to estimate at, 0<q<1
- `prec::Float64`: convergence criterion, 1e-8
- `epsu::Float64`: lower bound on 1/weight, 1e-6
- `maxiter::Int`:  maximum number of iterations, 1000

# Output
- `theta::Vector`: K vector, estimated coefficients
- `vcv::Matrix`:   KxK, traditional covariance matrix
- `vcv2::Matrix`:  KxK, Powell (1991) covariance matrix
- `vcv3::Matrix`:  KxK, Powell (1991) covariance matrix, uniform

"""
function QuantRegrIRLSPs(y,x::Matrix,q=0.5;prec=1e-8,epsu=1e-6,maxiter=1000)

  (T,K) = size(x)
  xw    = copy(x)

  (b_old,b,u,iter) = (zeros(K),fill(1e+6,K) .+ prec,zeros(T),0)

  while maximum(abs,b - b_old) > prec
    copyto!(b_old, b)
    b  .= (xw'*x)\(xw'*y)
    u  .= y - x*b
    #u  .= ifelse.(u.>0,1-q,q).*abs.(u)   #as in Python code, divide loss fn by q(1-q) to get it
    u  .= ifelse.(u.>0,1/q,1/(1-q)).*abs.(u)   #abs(u)/q if u>0, abs(u)/(1-q) if u<0
    u  .= max.(u,epsu)                         #not smaller than epsu
    xw .= x./u
    iter = iter + 1
    if iter > maxiter
      @warn("$iter > maxiter")
      b = NaN
      break
    end
  end

  res = y - x*b

  D   = x'x/T
  h   = 1.06*std(res)/T^0.2                        #Silverman (1986) recommendation
  fx  = exp.(-0.5*((res/h).^2))./(h*sqrt(2*pi))    #Green 7th ed, using N() kernel
  f0  = mean(fx)
  C   = f0*x'x/T
  C_1 = inv(C)
  vcv = q*(1-q)*C_1*D*C_1/T                         #variance-covariance matrix

  C    = (fx.*x)'x/T                                #Wooldrige 2dn ed, Powell 1991
  C_1  = inv(C)                                     #but with Gaussian kernel
  vcv2 = q*(1-q)*C_1*D*C_1/T                        #caputures (x,res) dependence

  fx  = (abs.(res) .<= h/2)/h                       #Wooldrige 2nd ed, Powell 1991
  C   = (fx.*x)'x/T                                 #uniform kernel
  C_1 = inv(C)
  vcv3 = q*(1-q)*C_1*D*C_1/T

  return b, vcv, vcv2, vcv3

end
3 Likes

I first modify the data by manually including FEs.
Then I write the formula corresponding to the regression with FE.
This is included here:

#this function runs a quantile regression with fixed effects
function QregFE(Xdata::DataFrame,dep_var::String,indep_vars::Array{String,1},FEs::Array{String,1},icon=true)
    Xdata = copy(Xdata);
    nr = size(Xdata,1);
    indep_vars
    #including FE
    for ife = 1:length(FEs)
        fe_list = sort(unique(Xdata[!,Symbol(FEs[ife])]));
        nFE = length(fe_list);
        for i=1:nFE-1
            Xdata[!,Symbol(FEs[ife]*"_"*string(i))] = Bool.(zeros(nr));
            Xdata[findall(Xdata[!,Symbol(FEs[ife])].==fe_list[i]),Symbol(FEs[ife]*"_"*string(i))] .=true;
            indep_vars = vcat(indep_vars,FEs[ife]*"_"*string(i));
        end
    end
    #creating formula using the strings
    reg_string = "@formula(";
    reg_string = reg_string*" "*dep_var*" ~ ";
    if icon; reg_string = reg_string*" 1 + "; end;
    reg_string = reg_string*join(indep_vars, '+');
    reg_string = reg_string*")";
    fm = eval(Meta.parse(reg_string));
    return fm, Xdata;
end

Finally, I use your function qreg. This is my approach.

1 Like

The formula construction part is much more complicated than it needs to be. You can construct a formula programatically pretty easily.

https://juliastats.org/StatsModels.jl/stable/formula/#Constructing-a-formula-programmatically-1

2 Likes

I used the stuff that’s used in the R package quantreg; but I’d be interested in building a more general test suite for generalised regression; btw speed is usually not a key element for that kind of stuff unless people have β€œbig” data but iirc your package makes use of RWLS (I’m typing on my phone here and may be remembering wrong!) whereas the package I linked to offers a few solvers, some which can be faster.

Anyway this is maybe for another discussion; I’m generally interested in building a generalised regression toolbox in Julia that makes good use of tools that are already available like Optim and IterativeSolvers and can offer all the standard penalised regressions out there that would beat standard regression packages in R and Matlab etc. MLJLM is one (reasonably serious) attempt at that but I’d be happy to do something with more people. (GLMs is a bit too constrained in its design so it would have to be something else)

There were a few other attempts like the defunct Regressions.jl and mostly defunct SparseRegressions and there’s probably others out there…

3 Likes

it should default to the interior point solver but does also have iteratively reweighted least squares. The first hopefully matches quantregs default as it was programmed up according to the article (The Gaussian Hare and the Laplacian Tortoise).

Anyway yeah, it’s all about use-cases. The usecase in OP is quite performance sensitive though, and does require sparse liinear algebra to be acceptable. I know because I’ve estimated these models quite a few times with quite a few individuals. Using the penalized version of Koenker and also the plug-in estimator by (striking out on the author here).

2 Likes

So it turns out that in some case you can do a projected gradient descent and it’s significantly faster than the IP (let alone IRLS) to get to essentially the same coefs; but I’m aware these are always tricky claims to make and I’d be happy to investigate more in depth though maybe a different topic would be more appropriate :slight_smile:

1 Like

Interesting, yeah. Happy to add it directly to QuantileRegressions.jl. You could also consider benchmarking against some of the other non-default solvers in quantreg such as Barrodal and Roberts. I don’t know of the proximal versions are in quantreg but there are some variations out there. Anyway, this is getting a bit off-topic :slight_smile:

1 Like

My experience (& this discussion) makes me feel that Julia stats packages can work better together (be more modular). Currently if I want fixed effects (outside of FixedEffectModels.jl) I have to create them manually (by hand).

It would be great if a general package such as StatsModels.jl could automatically create fixed effects. Then these can be used w/ other stats packages in the ecosystem (such as QuantileRegressions.jl).

I’m not sure what you mean. Non-Real variables are automatically treated as fixed effects in glm.

julia> df = DataFrame(income = rand(1000), edu = rand(1000), state = string.(rand(1:50,1000)));

julia> lm(@formula(income ~ edu + state), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

income ~ 1 + edu + state

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                    Coef.  Std. Error      t  Pr(>|t|)   Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)   0.507432      0.0681312   7.45    <1e-12   0.373726   0.641137
edu           0.0327742     0.0324312   1.01    0.3125  -0.0308709  0.0964193
1 Like

@pdeffebach
Imagine I have a numeric code for state (like a FIPS code).
Compare: GLM.jl vs QuantileRegressions.jl vs FixedEffectModels.jl

using DataFrames, GLM, QuantileRegressions, FixedEffectModels;
N=1000;
df=DataFrame(income = rand(N), edu = rand(N), state = rand(1:50,N));
f1=@formula(income ~ edu + state)
f2=@formula(income ~ edu + fe(state))
#
a1 = GLM.lm(f1, df)
b1 = qreg(f1, df, .5)
c1 = reg(df, f1)
#
a2 = GLM.lm(f2, df)    #Error
b2 = qreg(f2, df, .5)  #Error
c2 = reg(df, f2)

Formula f1 treats state as a continuous variable & works w/ all packages that accept such formulae.

Formula f2 treats state as a fixed effect, and only works with FixedEffectModels.jl but not with GLM.jl & QuantileRegressions.jl.
It would be nice if fe() was in StatsModels.jl (or somewhere) allowing fe to automatically work w/ all formulas.

Similarly, @matthieu’s Vcov.jl currently only works w/ FixedEffectModels.jl but not yet with GLM.jl etc.

using Vcov
GLM.lm(f1, df, Vcov.cluster(:state))  #Error
reg(df, f1, Vcov.cluster(:state))     #no Error ;)

I hope we can increase modularity in the Julia stats ecosystem…

Just to be clear, in the trivial example above formula f2 can be easily done by hand.
My note is more generally about putting certain features that many different users want like fe into a more general packages available to everybody.
In STATA: reg y F(-1 0 1 2).D i.x1#(c.x2 i.x3) works across methods

I absolutely think we need a better way of using fixed effects in GLM and statsmodels mode generally. I think either fe(x) or categorical(x) or something should be used to refer to fixed effects. For reference, the way to do this in StatsModels currently is

julia> lm(@formula(income ~ edu + state), df, contrasts = Dict(:state => EffectsCoding()))

Also, you have a typo in your post

reg(df, f1, Vcov.cluster(:state))     #Error

This works fine, does not error.

I definitely think that

  1. StatsModels should support fe
  2. FixedEffectModels should support glm and the general ecosystem better.
2 Likes