My effort to use the ^
operator from RegressionFormulae doesn’t seem to have worked. What am I missing?
The docs just referenced say
(a+b+c)^2
generatesa + b + c + a&b + a&c + b&c
, but nota&b&c
Test code:
using CategoricalArrays
using StatsModels
using DataFrames
using RegressionFormulae
function follow(f::FormulaTerm, df::DataFrame)
print("formula: ", f, "\n with terms ", terms(f))
s = schema(f, df)
print("\nschema: ", s)
ts = apply_schema(f, s)
print("apply_schema: ", ts, " with coefficient names", coefnames(ts))
x = modelcols(ts.rhs, df)
print("\nmodelcols: the the result has size ", size(x), " and type ", typeof(x))
## it's Matrix{Float64} and I see no labels on it. Nor does the type allow any.
end
df = DataFrame(a=[1, 2, 3], b=[0, 0, 1], c=[0.5, 2.5, 20], d=categorical(["xa", "xa", "xb"]))
follow(@formula(a~(a+b+c)^2), df)
Result:
formula: a ~ :((a + b + c) ^ 2)
with terms Term[a, b, c]
schema: StatsModels.Schema with 3 entries:
c => c
b => b
a => a
apply_schema: a ~ :((a + b + c) ^ 2) with coefficient names("a", "(a + b + c) ^ 2")
modelcols: the the result has size (3, 1) and type Matrix{Float64}
Expected result:
After apply_schema
coefficient names would reflect the expanded list of terms, and modelcols
would have more than 1 column (like 7).