How to get number of real levels of InteractionTerm from fitted model (GLM) via StatsModels?

I want to get a number of levels of InteractionTerm (not all combination, but real exist) from the fitted model. I can’t get it as the number of columns in model-matrix because it can be made with DummyCoding or FulDummyCoding.
So how I can do this?

Do you mean InteractionTerm instead of InterceptTerm?

indeed… my fault… edited.

Can you give an example of a toy model and desired output. It will makes things much clearer. Thanks

Hi!

For example, for this data: rds1.csv, I want to make model:

ols  = lm(@formula(Var ~ Subject&Sequence+Period+Formulation), berds1)

And then I want to get lsmeans for Intercept, Period and Formulation, for this I should get number of existed levels of Subject&Sequence.

1 Like

Perhaps what you are looking for is:

julia> levels(df.Subject)
18-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18

julia> levels(df.Sequence)
2-element Vector{String3}:
 "RT"
 "TR"

It should be possible to replicate the outputs of lsmeans without going into the guts of the packages.
If you had the above example in R, which command would you like to run with lsmeans?

Hi! Thank you!
This factors are nested, and real number of levels is 18, for another data if factors are crossed term Subject&Sequence can have from 18 to 36 levels, dependent from exist combinations. I want to make extension for GLM.jl - tables of lsmeans for my work.

I’m still confused about your question. But the following might help:

using DataFrames
using CSV
using StatsBase
using GLM
using IterTools

csvurl = "https://raw.githubusercontent.com/PharmCat/Metida.jl/master/test/csv/berds2x2/rds1.csv"
d = download(csvurl)
df = CSV.read(d, DataFrame)
ols  = lm(@formula(Var ~ Subject&Sequence+Period+Formulation), df)
lsm = DataFrame(
  filter(x->!isnan(x[:mean]),
    [(Subject=L[1], 
      Sequence=L[2], 
      mean=mean(df[(df.Subject .== L[1]).&&(df.Sequence .== L[2]),:Var])
     ) 
     for L in Iterators.product(levels(df.Subject), levels(df.Sequence))
    ]
  )
)

which gives a table like this:

18Γ—3 DataFrame
 Row β”‚ Subject  Sequence  mean    
     β”‚ Int64    String3   Float64 
─────┼────────────────────────────
   1 β”‚       1  RT        195.615
   2 β”‚       2  RT        106.6
   3 β”‚       4  RT        181.78
   4 β”‚       7  RT        142.11
   5 β”‚       9  RT        124.28
   6 β”‚      10  RT        123.02
   7 β”‚      12  RT        233.29
   8 β”‚      16  RT        143.625
   9 β”‚      17  RT        188.66
  10 β”‚       3  TR        233.52
  11 β”‚       5  TR        143.285
  12 β”‚       6  TR        111.15
  13 β”‚       8  TR         51.645
  14 β”‚      11  TR        104.18
  15 β”‚      13  TR        155.275
  16 β”‚      14  TR        180.525
  17 β”‚      15  TR        190.26
  18 β”‚      18  TR        121.745

There are only 18 combinations of Subject and Sequence.

I will have try to explain. I have lm model from GLM. And I want to calculate lsmeans. lsmeans is a linear combination of coefficients, so lsmeans = Ξ²' * l where Ξ² - coef vector and l-vector is a specific vector. If i want to get lsmean for Sequence:TR I should stay in l-vecto 1 for TR level of factor Sequence, and 0 for other levels of sequence, and 1/n for other coefficients where n - number of levels of factor from coefficient come.
First term (factor) is Intercept - it is easy - number of levels = 1, if continuous number of levels = 1, for categorical it is easy too: length(obj.mf.f.rhs.terms[i].contrasts.levels), but for InteractionTerm I have problem because if I just take product of all factor levels it will be not number of real combination. From other side i can take number of levels from obj.mm.assign, then get number of non-zero coefficient and add 1, but I think it won’t wort for all cases, and I suppose it will depend on coding method. So my question - is StatsModels have method to get real number of levels for InteractionTerm? If not it is very bad to recalculate this from data.

Another way to calculate the number of real number of levels is from the model matrix:

reallevels = unique(eachrow(@views ols.mm.m[:,4:5]))

length(reallevels) is 18.

The colunm numbers can be seen from assignment in ols.mm.assign.

julia> ols.mm.assign
5-element Vector{Int64}:
 1
 2
 3
 4
 4

The 4 is from ols.mf.f. Where we see, the 4th predictor is the interaction term:

julia> ols.mf.f
FormulaTerm
Response:
  Var(continuous)
Predictors:
  1
  Period(continuous)
  Formulation(DummyCoding:2β†’1)
  Subject(continuous) & Sequence(StatsModels.FullDummyCoding:2β†’2)

But, calculating from model matrix is not far away from calculating from original data, as the matrix has the same number of lines.

Perhaps I’m still not getting it, but if you find a solution, it will be nice if you post it.

Hi! This function is not based on StatsModels, but return real number of levels for InteractionTerm:

using StatsModels
function levelnum(mf, n)
    term = mf.f.rhs.terms[n]
    isa(term, InteractionTerm) || error("This term is not InteractionTerm")
    names = Tuple(t.sym for t in term.terms)
    cols = mf.data[names]
    length(unique(zip(cols...)))
end
2 Likes