ModelFrame contrast & factors in StatsModels in v"0.6.x"

Hello! I’m confused. In StatsModels 0.5.0 i can get contrasts levels simply:

ModelFrame.contrasts[:name].levels

How can I do it now?

And how can i get predictors list?

Ok, I just found how to get it by number:

ModelFrame.f.rhs.terms[2].contrasts.contrasts

How to get contrasts by Symbol (column name)?

Ok, i made some solution to do this:

#Find all factors
function findtermnames(MF::ModelFrame)
    a = Array{Symbol, 1}(undef, 0)
    l = length(MF.f.rhs.terms)
    for i = 1:l
        if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
        push!(a, MF.f.rhs.terms[i].sym)
    end
    return Tuple(a)
end
#Find by Symbol
function findterm(MF::ModelFrame, symbol::Symbol)::Int
    l = length(MF.f.rhs.terms)
    for i = 1:l
        if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
        if MF.f.rhs.terms[i].sym == symbol return i end
    end
    return 0
end
#Return length by Symbol
function termmodellen(MF::ModelFrame, symbol::Symbol)::Int
    id = findterm(MF, symbol)
    return length(MF.f.rhs.terms[id].contrasts.termnames)
end

Is it right way?

Is it any “build in” function that i could not find to do this?

Ok!
For example, I can get coefficient names by coefnames() , how to get facor or term for each coefficient in list?

I posted on the github issue you opened, but I’ll copy my response here for posterity :slight_smile: At a high level, the internal structure of 0.5 and 0.6+ is fundamentally different (you can see the heart of the changes in the Terms 2.0: Son of Terms PR, or read the discourse posts or latest docs)

You can get the ContrastsMatrix for each categorical term like this:

julia> using StatsModels, StatsBase

julia> data = (a = rand(10), b = sample(1:3, 10), c = sample([:a, :b, :c], 10));

julia> f = apply_schema(@formula(0 ~ a + b + c), schema(data, Dict(:b=>CategoricalTerm)))
FormulaTerm
Response:
  0
Predictors:
  a(continuous)
  b(DummyCoding:3→2)
  c(DummyCoding:3→2)

julia> contrasts = [t.contrasts for t in terms(f.rhs) if t isa CategoricalTerm]
2-element Array{StatsModels.ContrastsMatrix{DummyCoding,T} where T,1}:
 StatsModels.ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2 => 2,3 => 3,1 => 1))                     
 StatsModels.ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a => 1,:b => 2,:c => 3))

The fields of this struct give you all the information about the contrast coding:

julia> cm = contrasts[end];

julia> cm.matrix
3×2 Array{Float64,2}:
 0.0  0.0
 1.0  0.0
 0.0  1.0

julia> cm.levels
3-element Array{Symbol,1}:
 :a
 :b
 :c

julia> cm.termnames
2-element Array{Symbol,1}:
 :b
 :c

I think by “contrast vectors” you mean the columns of the .matrix field, each of which corresponds to one column in the model matrix.

If you want to have a symbol-indexable version of this, then you’ll first need to construct a symbol-to-term mapping, like

julia> Dict(t.sym => t for t in terms(f) if hasproperty(t, :sym))
Dict{Symbol,AbstractTerm} with 3 entries:
  :a => a
  :b => b
  :c => c

This gives you all the terms, but you can also restrict to CategoricalTerms which have contrasts and extract the ContrastsMatrix structs all in one go:

julia> Dict(t.sym => t.contrasts for t in terms(f) if t isa CategoricalTerm)
Dict{Symbol,StatsModels.ContrastsMatrix{DummyCoding,T} where T} with 2 entries:
  :b => ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2=>2,3=>3,1=>1))
  :c => ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a=>1,:b=>2,:c=>3))

Also, the ModelFrame struct is largely superfluous for these purposes, all you need is the FormulaTerm, which you can access with the mf.f field. The ModelFrame/ModelMatrix interface is included mostly for backwards compatibility, but is likely to be dropped in favor of having modeling packages use the @formula, schema, apply_schema, modelcols API directly, which provides more flexibility in how terms are handled (the MixedModels.jl package is a good example of the benefits this brings)