That’s right, this isn’t currently supported. But with the power of Terms 2.0 you could implement this behavior fairly easily by overloading apply_schema(t::FunctionTerm{typeof(^)}, sch, ctx)
to generate the interactions you want. If you’re willing to take a dependency on Combinatorics.jl then you can do it really easily. The basic idea is to return a tuple of the main effect and interaction terms you’d like to include (I’ve used &
to create the interaction terms because it’s smart about treating a one-element interaction term as a no-op).
using StatsModels, Combinatorics, Iterators
combinations_upto(x, n) = Iterators.flatten(combinations(x, i) for i in 1:n)
expand_exp(args, deg::ConstantTerm) =
tuple(((&)(terms...) for terms in combinations_upto(args, deg.n))...)
StatsModels.apply_schema(t::FunctionTerm{typeof(^)}, sch::StatsModels.Schema, ctx::Type) =
apply_schema.(expand_exp(t.args_parsed...), Ref(sch), ctx)
StatsModels.apply_schema(t::FunctionTerm{typeof(^)}, sch::StatsModels.FullRank, ctx::Type) =
apply_schema.(expand_exp(t.args_parsed...), Ref(sch), ctx)
(the method for StatsModels.FullRank
is a bit of an unfortunate consequence of the way that the schema is applied for subtypes of StatisticalModel
; it’s necessary to avoid a method ambiguity).
Then you can do things like:
julia> f = @formula(y ~ 1 +(a+b+c+d) ^ 2)
FormulaTerm
Response:
y(unknown)
Predictors:
1
(a,b,c,d)->(a + b + c + d) ^ 2
julia> ff = apply_schema(f, schema(f, d))
FormulaTerm
Response:
y(continuous)
Predictors:
1
a(DummyCoding:2→1)
b(DummyCoding:2→1)
c(DummyCoding:2→1)
d(DummyCoding:2→1)
a(DummyCoding:2→1) & b(DummyCoding:2→1)
a(DummyCoding:2→1) & c(DummyCoding:2→1)
a(DummyCoding:2→1) & d(DummyCoding:2→1)
b(DummyCoding:2→1) & c(DummyCoding:2→1)
b(DummyCoding:2→1) & d(DummyCoding:2→1)
c(DummyCoding:2→1) & d(DummyCoding:2→1)
julia> modelcols(ff.rhs, d)
16×11 Array{Float64,2}:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
1.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0
1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
1.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0
1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0
1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
If you use a formula like this in the context of a statistical model, you’ll get the nice defaults like automatically adding an intercept term if one is not specified, and upgrading the first categorical predictor to “full rank” if no intercept is explicitly specified with 0
:
julia> apply_schema(@formula(y ~ (a+b+c+d)^2), sch, StatisticalModel)
FormulaTerm
Response:
y(continuous)
Predictors:
1
a(DummyCoding:2→1)
b(DummyCoding:2→1)
c(DummyCoding:2→1)
d(DummyCoding:2→1)
a(DummyCoding:2→1) & b(DummyCoding:2→1)
a(DummyCoding:2→1) & c(DummyCoding:2→1)
a(DummyCoding:2→1) & d(DummyCoding:2→1)
b(DummyCoding:2→1) & c(DummyCoding:2→1)
b(DummyCoding:2→1) & d(DummyCoding:2→1)
c(DummyCoding:2→1) & d(DummyCoding:2→1)
julia> apply_schema(@formula(y ~ 0 + (a+b+c+d)^2), sch, StatisticalModel)
FormulaTerm
Response:
y(continuous)
Predictors:
0
a(StatsModels.FullDummyCoding:2→2)
b(DummyCoding:2→1)
c(DummyCoding:2→1)
d(DummyCoding:2→1)
a(DummyCoding:2→1) & b(DummyCoding:2→1)
a(DummyCoding:2→1) & c(DummyCoding:2→1)
a(DummyCoding:2→1) & d(DummyCoding:2→1)
b(DummyCoding:2→1) & c(DummyCoding:2→1)
b(DummyCoding:2→1) & d(DummyCoding:2→1)
c(DummyCoding:2→1) & d(DummyCoding:2→1)
Note that this all requires a slight tweak to StatsModels to support the nested tuples of terms so it won’t work out of the box unless you ] dev StatsModels#dfk/sumcleanup
. You could get around that by creating a special term struct like ExpTerm
that (analogously with InteractionTerm) controls the behavior of this term in different contexts in more detail, but that would require a lot of extra methods (including more specific handling of FullRank
schemas).