Hi!
I want to construct a regression formula for a general quadratic model without writing out each interaction term separately. I remember that for the python Statsmodels package I could do something like:
Here, however, the last term is interpreted as a single parameter and not as
@formula(y ~ a^2+b^2+c^2 + a*b+b*c+a*c)
Is there an easy way to account for all combinatorial interactions?
(for 3 parameters the difference is obviously no big deal, but I’m looking for a solution for an arbitrary number of parameters)
I think you can do it via constructing terms directly.
using Base.Iterators
function allinteractions(terms::Vector{Symbol})
tmp = sum(Term.(terms))
tmp + sum([t1&t2 for(t1, t2) in Iterators.product(tmp, tmp)])
end
formula = Term(:y)~allinteractions([:a, :b, :c])
using GLM
lm(formula, df)
Based on your solution, the problem can be solved by
a = sum(InteractionTerm(term.((only(x),only(x)))) for x ∈ combinations([:a, :b, :c],1))
b = sum((&)(term.(x)...) for x ∈ combinations([:a, :b, :c],2))
c = sum(term(only(x)) for x ∈ combinations([:a, :b, :c],1))
formula = Term(:y) ~ a + b +c + ConstantTerm(1)
Although an even shorter form would be nice indeed.
The gist of it is that you need to implement a apply_schema(::FunctionTerm{typeof(^)}, ...) method that transforms that terms into the representation you want. That’s what’s in this file (along with definitions that allow you to do things like (term(:a) + term(:b) + term(:c))^2 at run time, in addition to @formula(y ~ (a+b+c)^2