ModelFrame contrast & factors in StatsModels in v"0.6.x"

PharmCat · September 1, 2019, 9:36pm

Hello! I’m confused. In StatsModels 0.5.0 i can get contrasts levels simply:

ModelFrame.contrasts[:name].levels

How can I do it now?

And how can i get predictors list?

PharmCat · September 1, 2019, 9:59pm

Ok, I just found how to get it by number:

ModelFrame.f.rhs.terms[2].contrasts.contrasts

How to get contrasts by Symbol (column name)?

PharmCat · September 2, 2019, 9:45pm

Ok, i made some solution to do this:

#Find all factors
function findtermnames(MF::ModelFrame)
    a = Array{Symbol, 1}(undef, 0)
    l = length(MF.f.rhs.terms)
    for i = 1:l
        if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
        push!(a, MF.f.rhs.terms[i].sym)
    end
    return Tuple(a)
end
#Find by Symbol
function findterm(MF::ModelFrame, symbol::Symbol)::Int
    l = length(MF.f.rhs.terms)
    for i = 1:l
        if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
        if MF.f.rhs.terms[i].sym == symbol return i end
    end
    return 0
end
#Return length by Symbol
function termmodellen(MF::ModelFrame, symbol::Symbol)::Int
    id = findterm(MF, symbol)
    return length(MF.f.rhs.terms[id].contrasts.termnames)
end

Is it right way?

Is it any “build in” function that i could not find to do this?

PharmCat · September 11, 2019, 10:42pm

Ok!
For example, I can get coefficient names by coefnames() , how to get facor or term for each coefficient in list?

dave.f.kleinschmidt · September 28, 2019, 9:21am

I posted on the github issue you opened, but I’ll copy my response here for posterity At a high level, the internal structure of 0.5 and 0.6+ is fundamentally different (you can see the heart of the changes in the Terms 2.0: Son of Terms PR, or read the discourse posts or latest docs)

You can get the ContrastsMatrix for each categorical term like this:

julia> using StatsModels, StatsBase

julia> data = (a = rand(10), b = sample(1:3, 10), c = sample([:a, :b, :c], 10));

julia> f = apply_schema(@formula(0 ~ a + b + c), schema(data, Dict(:b=>CategoricalTerm)))
FormulaTerm
Response:
  0
Predictors:
  a(continuous)
  b(DummyCoding:3→2)
  c(DummyCoding:3→2)

julia> contrasts = [t.contrasts for t in terms(f.rhs) if t isa CategoricalTerm]
2-element Array{StatsModels.ContrastsMatrix{DummyCoding,T} where T,1}:
 StatsModels.ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2 => 2,3 => 3,1 => 1))                     
 StatsModels.ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a => 1,:b => 2,:c => 3))

The fields of this struct give you all the information about the contrast coding:

julia> cm = contrasts[end];

julia> cm.matrix
3×2 Array{Float64,2}:
 0.0  0.0
 1.0  0.0
 0.0  1.0

julia> cm.levels
3-element Array{Symbol,1}:
 :a
 :b
 :c

julia> cm.termnames
2-element Array{Symbol,1}:
 :b
 :c

I think by “contrast vectors” you mean the columns of the .matrix field, each of which corresponds to one column in the model matrix.

If you want to have a symbol-indexable version of this, then you’ll first need to construct a symbol-to-term mapping, like

julia> Dict(t.sym => t for t in terms(f) if hasproperty(t, :sym))
Dict{Symbol,AbstractTerm} with 3 entries:
  :a => a
  :b => b
  :c => c

This gives you all the terms, but you can also restrict to CategoricalTerms which have contrasts and extract the ContrastsMatrix structs all in one go:

julia> Dict(t.sym => t.contrasts for t in terms(f) if t isa CategoricalTerm)
Dict{Symbol,StatsModels.ContrastsMatrix{DummyCoding,T} where T} with 2 entries:
  :b => ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2=>2,3=>3,1=>1))
  :c => ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a=>1,:b=>2,:c=>3))

dave.f.kleinschmidt · September 28, 2019, 9:25am

Also, the ModelFrame struct is largely superfluous for these purposes, all you need is the FormulaTerm, which you can access with the mf.f field. The ModelFrame/ModelMatrix interface is included mostly for backwards compatibility, but is likely to be dropped in favor of having modeling packages use the @formula, schema, apply_schema, modelcols API directly, which provides more flexibility in how terms are handled (the MixedModels.jl package is a good example of the benefits this brings)

Topic		Replies	Views
StatsModels: get levels and model matrix for each level of categorical term New to Julia question	2	390	March 15, 2021
Should model matrix for nested factors be full-rank? Statistics	15	587	August 1, 2022
Having trouble with StatsModels.jl `modelmatrix` to replicate R results Statistics	1	63	April 11, 2025
How to get number of real levels of InteractionTerm from fitted model (GLM) via StatsModels? Statistics	10	292	November 15, 2022
Get columns of DataFrames.jl ModelMatrix object Statistics	14	2859	March 24, 2017

ModelFrame contrast & factors in StatsModels in v"0.6.x"

Related topics