Is there a way to change the base level of a categorical vector in GLM? It looks like the first level in the vector is chosen as the base level. I have tried to change the order of levels using `levels!`

but it does not seem to make a difference

Check out the use of `Contrasts`

in the `StatModels`

package, which is what generates the model matrix in a GLM model.

Sorry for necroposting, I looked in the docs referred above but couldnβt find a simple example.

How do I specify that the base level in the following example should be `"none"`

(because `m`

comes before `n`

and `s`

, the base level is currently `"many"`

)?

```
using DataFrames, GLM
df = DataFrame((x = x, y = y) for x in ("none", "some", "many") for y in rand(10))
ols = lm(@formula(y ~ x), df)
```

Thanks!

If you convert the variable `x`

to a `CategoricalArray`

you can use `relevel!`

to change the order of the levels.

The StatsModels documentations says the default categorical encoding is DummyEncoding with the first level as the reference level. After constructing the untyped FormulaTerm and applying schema, you can create the ModelFrame and setcontrasts! to choose the contrast coding system and the base level you want:

```
mf = ModelFrame(@formula(y ~ 1 + a + b), df) # create ModelFrame
c1 = DummyCoding(base="a", levels=["a", "b", "c"]) # build desired contrasts
c2 = HelmertCoding(base=2, levels=[1, 2, 3])
setcontrast!(mf, Dict(:a => c1, :b => c2)) # set preferred contrast
```

@dmbatesβs solution is easier if you donβt need to change contrast coding system.

I went with this, and just for posterityβs sake the function is `levels!`

(not `relevel!`

, though I agree it makes more sense).

IIRC you can pass that dict as the `contrasts=`

kw arg in the GLM functionsβ¦

Edit: you also donβt need to specify the levels just to change the base (they are extracted when creating the ContrastsMatrix later)

The docs on this could probably be better to be fair

Yes, thank you. The flexibility of the GLM design allows one to define characteristics of categorical variables at several places.

You should be able to do

```
ols = lm(@formula(y ~ x), df, contrasts = Dict(:x => DummyCoding(base="none")))
```

But that only works on GLM master at the moment (for some reason kwargs werenβt being passed to fitβ¦). For now you can do

```
ols = fit(LinearModel, df, contrasts = Dict(:x => DummyCoding(base="none")))
```

Hereβs an example to prove it works

```
julia> df = DataFrame(y = rand(12), x = repeat(["some", "none", "many"], 4))
12Γ2 DataFrame
β Row β y β x β
β β Float64 β String β
βββββββΌββββββββββββββΌβββββββββ€
β 1 β 0.886704 β some β
β 2 β 0.226046 β none β
β 3 β 0.196006 β many β
β 4 β 0.40502 β some β
β 5 β 0.000587951 β none β
β 6 β 0.395679 β many β
β 7 β 0.773263 β some β
β 8 β 0.260962 β none β
β 9 β 0.795655 β many β
β 10 β 0.757367 β some β
β 11 β 0.401757 β none β
β 12 β 0.732157 β many β
julia> fit(LinearModel, @formula(y ~ x), df, contrasts = Dict(:x => DummyCoding(base="none")))
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
y ~ 1 + x
Coefficients:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
(Intercept) 0.222338 0.112337 1.9792 0.0792 -0.0317865 0.476463
x: many 0.307536 0.158869 1.93578 0.0849 -0.0518506 0.666923
x: some 0.48325 0.158869 3.04182 0.0140 0.123864 0.842637
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```

Whatβs happening internally here is that the contrasts argument gets passed to `schema`

as a βhintβ about how to code the `:x`

variable:

```
julia> sch = schema(df, Dict(:x => DummyCoding(base="none")))
StatsModels.Schema with 2 entries:
y => y
x => x
julia> coefnames(sch[term(:x)])
2-element Array{String,1}:
"x: many"
"x: some"
julia> f = apply_schema(@formula(y ~ x), sch, RegressionModel)
FormulaTerm
Response:
y(continuous)
Predictors:
1
x(DummyCoding:3β2)
julia> modelcols(f.rhs, df)
12Γ3 Array{Float64,2}:
1.0 0.0 1.0
1.0 0.0 0.0
1.0 1.0 0.0
1.0 0.0 1.0
1.0 0.0 0.0
1.0 1.0 0.0
1.0 0.0 1.0
1.0 0.0 0.0
1.0 1.0 0.0
1.0 0.0 1.0
1.0 0.0 0.0
1.0 1.0 0.0
julia> coefnames(f.rhs)
3-element Array{String,1}:
"(Intercept)"
"x: many"
"x: some"
```

Update: once GLM v.1.3.7 is merged (https://github.com/JuliaRegistries/General/pull/9611) you should be able to use the `lm(::FormulaTerm, ::Table; contrasts=...)`

syntax

Thank you!