Bug in StatsModels.modelmatrix()

as per below, modelmatrix() (also modelcols()) fails when the formula is “quadratic” rather than “linear”:

using DataFrames
using GLM
df = DataFrame(a = 1.0:3.0, b = 2.0:4.0, y = 5.0:7.0)
fml1 = @formula(y ~ 1 + a + b);
fml2 = @formula(y ~ 1 + a * (a + b) + b * b);
modelmatrix(fml1, df);  # fine
modelmatrix(fml2, df);
ERROR: MethodError: no method matching iterate(::ContinuousTerm{Float64})
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:664
  iterate(::Union{LinRange, StepRangeLen}, ::Int64) at range.jl:664
  iterate(::T) where T<:Union{Base.KeySet{var"#s79", var"#s78"} where {var"#s79", var"#s78"<:Dict}, Base.ValueIterator{var"#s77"} where var"#s77"<:Dict} at dict.jl:693
  ...
Stacktrace:
  [1] iterate(::Base.Generator{ContinuousTerm{Float64}, StatsModels.var"#32#33"{NamedTuple{(:a, :b, :y), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}}}}})
    @ Base ./generator.jl:44
  [2] modelcols(t::InteractionTerm{ContinuousTerm{Float64}}, d::NamedTuple{(:a, :b, :y), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}}})
    @ StatsModels ~/.julia/packages/StatsModels/MDeyQ/src/terms.jl:532

is it possible to have a quick fix patch on iterate() to solve the problem?

please advise, thanks.

I’m closer to the problem. The term a & a and b & b got the type InteractionTerm{ContinuousTerm{Float64}}, that cannot be handled properly, instead of InteractionTerm{Tuple{ContinuousTerm{Float64}, ContinuousTerm{Float64}}}:

julia> fml2
FormulaTerm
Response:
  y(unknown)
Predictors:
  1
  a(unknown)
  b(unknown)
  a(unknown) & a(unknown)
  a(unknown) & b(unknown)
  b(unknown) & b(unknown)

julia> typeof(sfml2.rhs.terms[1])
InterceptTerm{true}

julia> typeof(sfml2.rhs.terms[2])
ContinuousTerm{Float64}

julia> typeof(sfml2.rhs.terms[3])
ContinuousTerm{Float64}

julia> typeof(sfml2.rhs.terms[4])
InteractionTerm{ContinuousTerm{Float64}}

julia> typeof(sfml2.rhs.terms[5])
InteractionTerm{Tuple{ContinuousTerm{Float64}, ContinuousTerm{Float64}}}

julia> typeof(sfml2.rhs.terms[6])
InteractionTerm{ContinuousTerm{Float64}}

so, for the time being we need to explicitly create the square terms like:

fml3 = @formula(y ~ 1 + a + (a ^ 2) + (a & b) + b + (b ^ 2));

julia> modelmatrix(fml3, df)
3×6 Matrix{Float64}:
 1.0  1.0  1.0  2.0   4.0   2.0
 1.0  2.0  4.0  3.0   9.0   6.0
 1.0  3.0  9.0  4.0  16.0  12.0

in order the avoid the problem.

Better file this kind of bug directly on GitHub. Cc: @dave.f.kleinschmidt

1 Like

done here.

the problem remains is I don’t know how to “Cc” in GitHub…

1 Like