StatsModels R-like formula for a general quadratic model

Hi!
I want to construct a regression formula for a general quadratic model without writing out each interaction term separately. I remember that for the python Statsmodels package I could do something like:

y~np.power(a,2)+np.power(b,2)+np.power(c,2)+(a+b+c)**2

In julia, I thought the analogon could look like

@formula(y ~ a^2+b^2+c^2 +(a+b+c)^2) .

Here, however, the last term is interpreted as a single parameter and not as

@formula(y ~ a^2+b^2+c^2 + a*b+b*c+a*c) 

Is there an easy way to account for all combinatorial interactions?
(for 3 parameters the difference is obviously no big deal, but I’m looking for a solution for an arbitrary number of parameters)

2 Likes

Doesn’t seem to say in the manual Modeling tabular data · StatsModels.jl

But I was curious.

I think you can do it via constructing terms directly.

using Base.Iterators
function allinteractions(terms::Vector{Symbol})
    tmp = sum(Term.(terms))
    tmp + sum([t1&t2 for(t1, t2) in Iterators.product(tmp, tmp)])
end

formula = Term(:y)~allinteractions([:a, :b, :c])
using GLM
lm(formula, df)
3 Likes

Thx, but this formula still needs to be cleaned-up for permutations which it is not accounting for.

1 Like

Just to clarify (as I’m not familiar with Python’s formula syntax), are you looking for

julia> sum(length(x) == 1 ? term(only(x)) : (&)(term.(x)...) for x ∈ combinations([:a, :b, :c]))
a(unknown)
b(unknown)
c(unknown)
a(unknown) & b(unknown)
a(unknown) & c(unknown)
b(unknown) & c(unknown)
a(unknown) & b(unknown) & c(unknown)

but with all terms in the formula squared?

This comes closer, but maybe without the last term, as I would consider it to be of order 3.
So

@formula(y ~ a^2+b^2+c^2 + a&b+b&c+a&c + a + b + c +1)

Based on your solution, the problem can be solved by

a = sum(InteractionTerm(term.((only(x),only(x)))) for x ∈ combinations([:a, :b, :c],1))
b = sum((&)(term.(x)...) for x ∈ combinations([:a, :b, :c],2))
c = sum(term(only(x)) for x ∈ combinations([:a, :b, :c],1))
formula = Term(:y) ~ a + b +c + ConstantTerm(1)

Although an even shorter form would be nice indeed.

1 Like

If you’re feeling brave, there’s a prototype implementation of the ^ syntax you’d like in this not-quite-package: GitHub - kleinschmidt/RegressionFormulae.jl

The gist of it is that you need to implement a apply_schema(::FunctionTerm{typeof(^)}, ...) method that transforms that terms into the representation you want. That’s what’s in this file (along with definitions that allow you to do things like (term(:a) + term(:b) + term(:c))^2 at run time, in addition to @formula(y ~ (a+b+c)^2

Ah, cool. Thx!