I am trying to use FixedEffectModels.jl with a dataset with a continuous explanatory variable, and want to run regressions on a binned version of this variable. I have two approaches: a) I can either, via cut, produce categorical variables, or b) via map produce numerical assignments of the bins that can then be treated as dummy variables via “contrasts” (I am sure there is more efficient ways to do the binning)
I have two questions:
- If I use categorical variables, is there a way to pick a “base” as one is allowed to for dummy variables?
- It seems that the regressions results differ slightly between the categorical regressions and the dummy variable regression. Why would that be?
# Categorical regression
Linear Model
===========================================================================================
Number of obs: 1380 Degrees of freedom: 2
R2: 0.092 R2 Adjusted: 0.091
F-Stat: 70.1648 p-value: 0.000
===========================================================================================
Sales | Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
-------------------------------------------------------------------------------------------
CategoricalPrice: [75.0, 100.0) | -5.06421 2.77783 -1.82308 0.069 -10.5135 0.385043
CategoricalPrice: [100.0, 201.9] | -22.4024 1.89182 -11.8417 0.000 -26.1135 -18.6912
(Intercept) | 129.814 0.974587 133.199 0.000 127.902 131.726
===========================================================================================
# Dummy regression
Linear Model
========================================================================
Number of obs: 1380 Degrees of freedom: 2
R2: 0.095 R2 Adjusted: 0.093
F-Stat: 71.9556 p-value: 0.000
========================================================================
Sales | Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
------------------------------------------------------------------------
DummyPrice: 2 | -4.5169 2.76519 -1.63349 0.103 -9.94134 0.907545
DummyPrice: 3 | -22.6697 1.89169 -11.9838 0.000 -26.3806 -18.9588
(Intercept) | 129.814 0.973439 133.356 0.000 127.904 131.723
========================================================================
# CODE
using DataFrames, RDatasets, FixedEffectModels, CategoricalArrays, StatsBase
# importing data from RDatasets
df = dataset("plm", "Cigar")
# creating categorical price column
c=cut(df.Price,[0;75;100],extend = true)
insertcols!(df,ncol(df)+1,:CategoricalPrice=>c)
# creating dummy price column
d = map(x -> x>100 ? 3 : (x<75 ? 1 : 2), df[:,"Price"])
insertcols!(df,ncol(df)+1,:DummyPrice=>d)
# regressions
reg(df, @formula(Sales ~ CategoricalPrice)) # running regression with categorical price variable (not sure how to pick base)
reg(df, @formula(Sales ~ DummyPrice); contrasts = Dict(:DummyPrice => DummyCoding())) # running regression with dummy variable (this defaults to base=1)
reg(df, @formula(Sales ~ DummyPrice); contrasts = Dict(:DummyPrice => DummyCoding(base = 3))) # running regression with dummy variable (with base=3 imposed)