Extending @formula syntax in StatsModels.jl is difficult to understand.

I read the examples of `ploy`

in the official documentation, but I feel that the comments are not detailed enough, and I still don’t quite understand.

For example, why define `StatsModels.apply_schema`

, and after defining it, how is it called in `f = apply_schema(f, schema(data))`

? Can you give a detailed and clear procedure?

It’s hard to know where to start without more details about what kind of custom syntax you’re trying to implement and why!

The docs on “lifecycle of a formula” gives a pretty complete explanation of each step from `@formula`

to a fitted model; here’s teh section on `apply_schema`

: Internals and extending the @formula · StatsModels.jl

why define

`StatsModels.apply_schema`

Because that is the mechanism the StatsModels.jl provides for implementing custom syntax. Custom syntax *is* in some sense a method of `apply_schema(::FunctionTerm{typeof(my_special_syntax_function)}, ...)`

.

how is it called in

`f = apply_schema(f, schema(data))`

Like teh docs say, when `apply_schema`

is called on a `FormulaTerm`

(or generally any other term that has “children”, like an interaction term, “tuple term” as generated by `+`

, etc.), the methods for those terms will call `apply_schema`

recursively on the children. So if you have a formula like `y ~ my_special_function(x)`

, and you call `apply_schema`

on that, eventually it will call `apply_schema`

on the `my_special_function(x)`

bit, which is represented internally as a `FunctionTerm{typeof(my_special_function)}(my_special_function, [Term(:x)], ...)`

. So, by defining a method for `apply_schema(::FunctionTerm{typeof(my_special_function)}, ...)`

, you can control what happens when that sub-term is processed.

The internal design of StatsModels.jl is indeed somewhat complicated. Depending on what you are trying to do, it may not be necessary to deal with the `@formula`

at all if the benefit is small.

Thank you for your answer. I also guessed that each term of the formula would call `apply_schema`

, but I had no basis and could only ask for help on the forum. Later, I may continue to ask questions about some details of the `ploy`

example.