Designing Custom Syntax for @formula

Ross_Boylan · May 3, 2022, 10:09pm

I’m trying to understand the differences between the relatively compact implementation of a custom interpretation of ^ for @formula and the general advice on extending @formula, in particular the example here of how to implement a custom interpretation for poly().

The main source of the difference is that the latter creates a custom Term type, PolyTerm to hold the expression, and then implements a bunch of methods to deal with that type. In contrast, the code for ^ expands the terms out immediately in apply_schema without any reference to new term types.

Why the differences, and which is a better model to use?

Perhaps without something like a PowerToTerm for ^ it would be harder to construct formulae programmatically?

Thanks.

dave.f.kleinschmidt · May 13, 2022, 3:23pm

I actually go back and forth on this issue. Originally in StatsModels.jl, ALL of the special syntax stuff happened at parse time, inside the macro. So the formula that comes out at the end of a * b has a + b + a&b and no memory of a * b. Recently we’ve started to move more of that into run time by adding methods for things like Base.:*(a::Term, b::Term) = a + b + a & b, rather than having a transformation that works on the Expr that hte macro sees. IIRC all that stuff is languishing in https://github.com/JuliaStats/StatsModels.jl/pull/183 and I’ve had some second thoughts in the intervening time. Adding all those methods puts an even bigger burden on the compiler, but using a very differnet approach would require even more dramatic internal (and possibly external) changes (e.g., could be hard to have stuff like term(:a) * term(:b) work without defining those methods).

So, all of which is to say, it’s a design decision, and there’s no obviously correct choice I’d say that the best starting point is PROBABLY to start with a PolyTerm-like approach, rather than the ^ approach. It’s a lot simpler to get the bookkeeping right if you have a 1-1 match between the input FunctionTerm and the output terms. It’s possible to handle 1-to-many transforms but it can get fiddly (see some of the stuff around / for instance, or how / is handled on the RHS of random effects in MixedModels.jl)

Topic		Replies	Views
Extending @formula syntax is difficult to understand General Usage question , package	3	325	May 25, 2023
[ANN] Terms 2.0: son of Terms (new `@formula` implementation in StatsModels.jl Package Announcements statistics , data	0	1566	March 10, 2019
Is there a better DSL (domain-specific language) for defining a formula in linear models? Statistics	5	1178	April 1, 2019
Extending the `@formula` syntax to generate functions Statistics	3	525	July 20, 2020
StatsModels.jl: upcoming breaking changes to captured functions calls Statistics package , announcement , breaking	0	642	January 24, 2023

Designing Custom Syntax for @formula

Related topics