GeneralizedLinearMixedModel no method matching _ranef_refs

How can I fix this error?

form = @formula(y ~ (1 | a * b * c))
n = 1000
df = DataFrame(
    y=rand(Bool, n), 
    a=CategoricalArray(rand('a':'z', n)), 
    b=CategoricalArray(rand('a':'z', n)), 
    c=CategoricalArray(rand('a':'z', n)), 
)
@time fit(GeneralizedLinearMixedModel, form, df, MixedModels.Bernoulli())

julia> @time fit(GeneralizedLinearMixedModel, form, df, MixedModels.Bernoulli())
ERROR: MethodError: no method matching _ranef_refs(::Tuple{StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.InteractionTerm{Tuple{StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}}}, StatsModels.InteractionTerm{Tuple{StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}}}, StatsModels.InteractionTerm{Tuple{StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}}}, StatsModels.InteractionTerm{Tuple{StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}, StatsModels.CategoricalTerm{DummyCoding, Char, 25}}}}, ::NamedTuple{(:y, :a, :b, :c), Tuple{Vector{Bool}, CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}}, CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}}, CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}}}})

The problem is you have a grouping variable a * b * c that isn’t supported. This isn’t supported because I haven’t seen a case where it makes sense statistically. Can you give us more context on the underlying statistical or scientific question you’re trying to address with such a model?

I have a bunch of observations from different groups in different locations and years. Each observation is (id, group, location, year, value). I wanted to estimate the average value in each (group, location, year). The sample size in each (group, location, year) varies, with some being small.

I wanted to get more realistic estimates for the small-size tuples by sharing information with the large ones. For example, (group 1, location 1, year 1) is small so its average value has high variance, but it could be pulled toward the middle by (group 1, location 1, year y), (group 1, location â„“, year 1), and (group g, location 1, year 1).

Does that seem like a sensible goal? Is there a better way of going about it?

So you want a random intercept for each combination of group, location and year? Then you need a & b & c not a * b * c. The problem with (1|a * b * c) is that it expands to (1|a) + (1|b) + (1|c) + (1|a&b) + (1|a&c) + (1|b&c) + (1|a&b&c) which in most cases won’t be identifiable.

That hopefully addresses the computational question, but I’m not sure if that’s the ideal way to answer the inferential question – I haven’t had time to think about it deeply and I would probably need more information to give more specific advice.

1 Like

@jar1 does (1 | a & b & c) give you what you want? That’s what we use for things like the nesting syntax (e.g., (1 | a / b) lowers to (1 | a) + (1 | a & b) IIRC…that might be backwards)

Yes, I think that’s working for me. Thank you both.

A post was split to a new topic: How to estimate uncertainty in a LinearMixedModel?