Hello! I’m confused. In StatsModels 0.5.0 i can get contrasts levels simply:
ModelFrame.contrasts[:name].levels
How can I do it now?
And how can i get predictors list?
Hello! I’m confused. In StatsModels 0.5.0 i can get contrasts levels simply:
ModelFrame.contrasts[:name].levels
How can I do it now?
And how can i get predictors list?
Ok, I just found how to get it by number:
ModelFrame.f.rhs.terms[2].contrasts.contrasts
How to get contrasts by Symbol (column name)?
Ok, i made some solution to do this:
#Find all factors
function findtermnames(MF::ModelFrame)
a = Array{Symbol, 1}(undef, 0)
l = length(MF.f.rhs.terms)
for i = 1:l
if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
push!(a, MF.f.rhs.terms[i].sym)
end
return Tuple(a)
end
#Find by Symbol
function findterm(MF::ModelFrame, symbol::Symbol)::Int
l = length(MF.f.rhs.terms)
for i = 1:l
if isa(MF.f.rhs.terms[i], InterceptTerm) continue end
if MF.f.rhs.terms[i].sym == symbol return i end
end
return 0
end
#Return length by Symbol
function termmodellen(MF::ModelFrame, symbol::Symbol)::Int
id = findterm(MF, symbol)
return length(MF.f.rhs.terms[id].contrasts.termnames)
end
Is it right way?
Is it any “build in” function that i could not find to do this?
Ok!
For example, I can get coefficient names by coefnames() , how to get facor or term for each coefficient in list?
I posted on the github issue you opened, but I’ll copy my response here for posterity At a high level, the internal structure of 0.5 and 0.6+ is fundamentally different (you can see the heart of the changes in the Terms 2.0: Son of Terms PR, or read the discourse posts or latest docs)
You can get the ContrastsMatrix
for each categorical term like this:
julia> using StatsModels, StatsBase
julia> data = (a = rand(10), b = sample(1:3, 10), c = sample([:a, :b, :c], 10));
julia> f = apply_schema(@formula(0 ~ a + b + c), schema(data, Dict(:b=>CategoricalTerm)))
FormulaTerm
Response:
0
Predictors:
a(continuous)
b(DummyCoding:3→2)
c(DummyCoding:3→2)
julia> contrasts = [t.contrasts for t in terms(f.rhs) if t isa CategoricalTerm]
2-element Array{StatsModels.ContrastsMatrix{DummyCoding,T} where T,1}:
StatsModels.ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2 => 2,3 => 3,1 => 1))
StatsModels.ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a => 1,:b => 2,:c => 3))
The fields of this struct give you all the information about the contrast coding:
julia> cm = contrasts[end];
julia> cm.matrix
3×2 Array{Float64,2}:
0.0 0.0
1.0 0.0
0.0 1.0
julia> cm.levels
3-element Array{Symbol,1}:
:a
:b
:c
julia> cm.termnames
2-element Array{Symbol,1}:
:b
:c
I think by “contrast vectors” you mean the columns of the .matrix
field, each of which corresponds to one column in the model matrix.
If you want to have a symbol-indexable version of this, then you’ll first need to construct a symbol-to-term mapping, like
julia> Dict(t.sym => t for t in terms(f) if hasproperty(t, :sym))
Dict{Symbol,AbstractTerm} with 3 entries:
:a => a
:b => b
:c => c
This gives you all the terms, but you can also restrict to CategoricalTerm
s which have contrasts and extract the ContrastsMatrix
structs all in one go:
julia> Dict(t.sym => t.contrasts for t in terms(f) if t isa CategoricalTerm)
Dict{Symbol,StatsModels.ContrastsMatrix{DummyCoding,T} where T} with 2 entries:
:b => ContrastsMatrix{DummyCoding,Int64}([0.0 0.0; 1.0 0.0; 0.0 1.0], [2, 3], [1, 2, 3], DummyCoding(nothing, nothing), Dict(2=>2,3=>3,1=>1))
:c => ContrastsMatrix{DummyCoding,Symbol}([0.0 0.0; 1.0 0.0; 0.0 1.0], Symbol[:b, :c], Symbol[:a, :b, :c], DummyCoding(nothing, nothing), Dict(:a=>1,:b=>2,:c=>3))
Also, the ModelFrame
struct is largely superfluous for these purposes, all you need is the FormulaTerm
, which you can access with the mf.f
field. The ModelFrame
/ModelMatrix
interface is included mostly for backwards compatibility, but is likely to be dropped in favor of having modeling packages use the @formula
, schema
, apply_schema
, modelcols
API directly, which provides more flexibility in how terms are handled (the MixedModels.jl package is a good example of the benefits this brings)