I want to get a number of levels of InteractionTerm (not all combination, but real exist) from the fitted model. I canβt get it as the number of columns in model-matrix because it can be made with DummyCoding or FulDummyCoding.
So how I can do this?
Do you mean InteractionTerm instead of InterceptTerm?
indeed⦠my fault⦠edited.
Can you give an example of a toy model and desired output. It will makes things much clearer. Thanks
Hi!
For example, for this data: rds1.csv, I want to make model:
ols = lm(@formula(Var ~ Subject&Sequence+Period+Formulation), berds1)
And then I want to get lsmeans for Intercept, Period and Formulation, for this I should get number of existed levels of Subject&Sequence
.
Perhaps what you are looking for is:
julia> levels(df.Subject)
18-element Vector{Int64}:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
julia> levels(df.Sequence)
2-element Vector{String3}:
"RT"
"TR"
It should be possible to replicate the outputs of lsmeans
without going into the guts of the packages.
If you had the above example in R, which command would you like to run with lsmeans
?
Hi! Thank you!
This factors are nested, and real number of levels is 18, for another data if factors are crossed term Subject&Sequence
can have from 18 to 36 levels, dependent from exist combinations. I want to make extension for GLM.jl - tables of lsmeans
for my work.
Iβm still confused about your question. But the following might help:
using DataFrames
using CSV
using StatsBase
using GLM
using IterTools
csvurl = "https://raw.githubusercontent.com/PharmCat/Metida.jl/master/test/csv/berds2x2/rds1.csv"
d = download(csvurl)
df = CSV.read(d, DataFrame)
ols = lm(@formula(Var ~ Subject&Sequence+Period+Formulation), df)
lsm = DataFrame(
filter(x->!isnan(x[:mean]),
[(Subject=L[1],
Sequence=L[2],
mean=mean(df[(df.Subject .== L[1]).&&(df.Sequence .== L[2]),:Var])
)
for L in Iterators.product(levels(df.Subject), levels(df.Sequence))
]
)
)
which gives a table like this:
18Γ3 DataFrame
Row β Subject Sequence mean
β Int64 String3 Float64
ββββββΌββββββββββββββββββββββββββββ
1 β 1 RT 195.615
2 β 2 RT 106.6
3 β 4 RT 181.78
4 β 7 RT 142.11
5 β 9 RT 124.28
6 β 10 RT 123.02
7 β 12 RT 233.29
8 β 16 RT 143.625
9 β 17 RT 188.66
10 β 3 TR 233.52
11 β 5 TR 143.285
12 β 6 TR 111.15
13 β 8 TR 51.645
14 β 11 TR 104.18
15 β 13 TR 155.275
16 β 14 TR 180.525
17 β 15 TR 190.26
18 β 18 TR 121.745
There are only 18 combinations of Subject
and Sequence
.
I will have try to explain. I have lm model from GLM. And I want to calculate lsmeans
. lsmeans
is a linear combination of coefficients, so lsmeans = Ξ²' * l
where Ξ² - coef vector and l-vector is a specific vector. If i want to get lsmean
for Sequence:TR
I should stay in l-vecto 1
for TR
level of factor Sequence
, and 0
for other levels of sequence
, and 1/n for other coefficients where n - number of levels of factor from coefficient come.
First term (factor) is Intercept - it is easy - number of levels = 1, if continuous number of levels = 1, for categorical it is easy too: length(obj.mf.f.rhs.terms[i].contrasts.levels)
, but for InteractionTerm I have problem because if I just take product of all factor levels it will be not number of real combination. From other side i can take number of levels from obj.mm.assign
, then get number of non-zero coefficient and add 1, but I think it wonβt wort for all cases, and I suppose it will depend on coding method. So my question - is StatsModels have method to get real number of levels for InteractionTerm? If not it is very bad to recalculate this from data.
Another way to calculate the number of real number of levels is from the model matrix:
reallevels = unique(eachrow(@views ols.mm.m[:,4:5]))
length(reallevels)
is 18.
The colunm numbers can be seen from assignment in ols.mm.assign
.
julia> ols.mm.assign
5-element Vector{Int64}:
1
2
3
4
4
The 4
is from ols.mf.f
. Where we see, the 4
th predictor is the interaction term:
julia> ols.mf.f
FormulaTerm
Response:
Var(continuous)
Predictors:
1
Period(continuous)
Formulation(DummyCoding:2β1)
Subject(continuous) & Sequence(StatsModels.FullDummyCoding:2β2)
But, calculating from model matrix is not far away from calculating from original data, as the matrix has the same number of lines.
Perhaps Iβm still not getting it, but if you find a solution, it will be nice if you post it.
Hi! This function is not based on StatsModels, but return real number of levels for InteractionTerm:
using StatsModels
function levelnum(mf, n)
term = mf.f.rhs.terms[n]
isa(term, InteractionTerm) || error("This term is not InteractionTerm")
names = Tuple(t.sym for t in term.terms)
cols = mf.data[names]
length(unique(zip(cols...)))
end