Comparison formula with random effect term [StatsModels.jl]

The follwing comparision between two formula with random effect term for MixedModels.jl gives false:

@formula(y ~ 1 + x + (1 | z3) ) == @formula(y ~ 1 + x + (1 | z3) ) 

However, a comparision formula without random effect term gives true, as expected.

@formula(y ~ 1 + x ) == @formula(y ~ 1 + x ) 

Is there any solution for this?

I’m not very familiar with StatsModels, so I don’t have a complete answer, but my expectation is that the equality comparison is not (solely) comparing the structural similarity of the models, and that random effects terms from 2 different formulas can’t be guaranteed equal (they are random, after all) even if they are structurally similar.

What are you trying to achieve by comparing formulas?

Thanks for your reply !
I am doing simulation with many formula stored in a vector.
I want delete a true model from candidates if true model is inside the candidate.

PS. I may do this by converting formula to string. But I asked this because it is not intuitive ^^.

Are the formula generated or manually created? If the latter, it might be more convenient to store them in something with a named index kind of interface (e.g. Dict) with meaningful names for each formula.

I agree that it’s not very intuitive, and based on that my gut impression is that there might be better/more appropriate approaches than trying to directly compare formulas. Are you comparing nested models and/or doing a stepwise regression (e.g. with a likelihood ratio test)?

Thanks, a lot

I now am tryng to use Dictionary or Tuple to make it work ^^.
Simulation work is about model averaging in mixed models.

I think that comparison should return true. I would file an issue at MixedModels.jl.

Executing:

f1 = @formula(y ~ 1 + x + (1 | z3) )
dump(f1)
f2 = @formula(y ~ 1 + x + (1 | z3) )
dump(f2)

It looks like the FunctionTerm is different in the two formulas.
The documentation (API documentation · StatsModels.jl) indicates that the fanon from the FunctionTerm is a generated anon function. So I would guess that since the formulas are created twice, the FunctionTerms are generated twice too, and then the comparison is unable to detect that they are the same.

1 Like

Yeah the problem is that (1 | z3) is interpreted as being a call to a custom function. It’s only after MixedModels calls apply_schema that these are detected as being a random effects term.

@dave.f.kleinschmidt Maybe == should ignore anonymous functions fields and only compare syntaxes?

Yeah that’s what should happen…the method for the FunctionTerm itself is actually there already, but the PR for general == of terms got bogged down so the formula term is checking ===. I’ll have a go at updating that PR (the person who opened it seems to have deleted their github account so I’ll have to open a new one: https://github.com/JuliaStats/StatsModels.jl/pull/241

Thanks so much

1 Like

thanks for the nudge :slight_smile: I’d forgotten about that PR