Having trouble with StatsModels.jl `modelmatrix` to replicate R results

Hi all, I’m trying to generate a model matrix to replicate R results, I’d appreciate some guidance. The input data is the Rail data from the nlme package in R, it has the format below:

> library(nlme)
> data(Rail)
> Rail
Grouped Data: travel ~ 1 | Rail
   Rail travel
1     1     55
2     1     53
3     1     54
4     2     26
5     2     37
6     2     32
7     3     78
8     3     91
9     3     85
10    4     92
11    4    100
12    4     96
13    5     49
14    5     51
15    5     50
16    6     80
17    6     85
18    6     83
> Rail$Rail
 [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
Levels: 2 < 5 < 1 < 6 < 3 < 4

The following R code sets up the model matrix I want:

> options(contrasts=c("contr.treatment","contr.treatment"))
> Z <- model.matrix(~Rail$Rail-1) ## r.e. model matrix
> Z
   Rail$Rail2 Rail$Rail5 Rail$Rail1 Rail$Rail6 Rail$Rail3 Rail$Rail4
1           0          0          1          0          0          0
2           0          0          1          0          0          0
3           0          0          1          0          0          0
4           1          0          0          0          0          0
5           1          0          0          0          0          0
6           1          0          0          0          0          0
7           0          0          0          0          1          0
8           0          0          0          0          1          0
9           0          0          0          0          1          0
10          0          0          0          0          0          1
11          0          0          0          0          0          1
12          0          0          0          0          0          1
13          0          1          0          0          0          0
14          0          1          0          0          0          0
15          0          1          0          0          0          0
16          0          0          0          1          0          0
17          0          0          0          1          0          0
18          0          0          0          1          0          0
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`Rail$Rail`
[1] "contr.treatment"

I do not know how to replicate this behavior in Julia, I’m having some trouble specifying the contrasts. Can someone help me out?

StatsModels can do this. I cannot find the exact same dataset so I replicated it using the Cigar dataset:

using RDatasets, StatsModels
df = dataset("plm", "Cigar")
x = ModelFrame(
    @formula(Price ~ 1 + State),
    df,
    contrasts = Dict(
        :State => DummyCoding(base=1) # specify which "base" element you want to use here
    ),
)
modelmatrix(x)
# 1380×46 Matrix{Float64}:
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
#  ⋮                        ⋮                        ⋮                        ⋮         ⋱       ⋮                        ⋮                        ⋮                        ⋮
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
#  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0

Some differences to note:

  • The R code you posted does not appear to include an intercept, if that is intentional you would specify the formula as @formula(Price ~ 0 + State)
  • The matrix output in Julia is simply a matrix, which does not include column names. You can get the column names with coefnames(x)
2 Likes