Extending @formula syntax is difficult to understand

Extending @formula syntax in StatsModels.jl is difficult to understand.
I read the examples of ploy in the official documentation, but I feel that the comments are not detailed enough, and I still don’t quite understand.
For example, why define StatsModels.apply_schema , and after defining it, how is it called in f = apply_schema(f, schema(data)) ? Can you give a detailed and clear procedure?

It’s hard to know where to start without more details about what kind of custom syntax you’re trying to implement and why!

The docs on “lifecycle of a formula” gives a pretty complete explanation of each step from @formula to a fitted model; here’s teh section on apply_schema: Internals and extending the @formula · StatsModels.jl

why define StatsModels.apply_schema

Because that is the mechanism the StatsModels.jl provides for implementing custom syntax. Custom syntax is in some sense a method of apply_schema(::FunctionTerm{typeof(my_special_syntax_function)}, ...).

how is it called in f = apply_schema(f, schema(data))

Like teh docs say, when apply_schema is called on a FormulaTerm (or generally any other term that has “children”, like an interaction term, “tuple term” as generated by +, etc.), the methods for those terms will call apply_schema recursively on the children. So if you have a formula like y ~ my_special_function(x), and you call apply_schema on that, eventually it will call apply_schema on the my_special_function(x) bit, which is represented internally as a FunctionTerm{typeof(my_special_function)}(my_special_function, [Term(:x)], ...). So, by defining a method for apply_schema(::FunctionTerm{typeof(my_special_function)}, ...), you can control what happens when that sub-term is processed.

The internal design of StatsModels.jl is indeed somewhat complicated. Depending on what you are trying to do, it may not be necessary to deal with the @formula at all if the benefit is small.

1 Like

Thank you for your answer. I also guessed that each term of the formula would call apply_schema, but I had no basis and could only ask for help on the forum. Later, I may continue to ask questions about some details of the ploy example.