The short answer is no, not currently. The reasons are a little arcane and I honestly still struggle to find the language to explain the difference, but here goes data:image/s3,"s3://crabby-images/fc6d2/fc6d27ad610fa159f2466a504b7cfca7fb8c9b8f" alt=":slight_smile: :slight_smile:"
In StatsModels formula world, there are two domains that functions can operate on: they can take in terms, returning new terms, or they can operate on values. All FunctionTerms
are of the second type: they take in data values and return new data values at the point when the model matrix is constructed (e.g., modelcols
). Everything else is of the first type, including lag
: it takes in a term, and returns a new term that wraps it. The reason this distinction is necessary is that, in general, many transformations are “stateful”: you can’t lag a term without having the whole column available, or at least without keeping track of some state (which hasn’t been implemented yet, FWIW…). Same thing for a categorical embedding (need to know the levels), using a spline basis function (need to know the knots), etc. FunctionTerm
is a kind of special-case catch-all for anything that ISN’T stateful, and can be computed elementwise.
However, the real reason this isn’t possible currently has more to do with how FunctionTerm
s are implemented efficiently, which is that as soon as a non-special call is encountered in the AST (e.g., an identity
wrapper), a single anonymous function is create that will evaluate the whole subtree. What that means is that any later, term-level calls like lag
are “protected” from any special interpretation, since they’re now moved inside teh body of an anonymous function that will only be called during modelcols
.
HOWEVER, https://github.com/JuliaStats/StatsModels.jl/pull/183 changes this, and one of the consequences is that the “protected”/“unprotected” distinction that was previously implicit can be explicitly controlled. For your case you’d be able to do something like ~ protect(unprotect(lag(x, 1)) + unprotect(lag(x, 2)) + unprotect(lag(x, 3)) + ...)
. The protect
has the effect of “blocking” normal formula syntax, so +
becomes literal +
(applied elementwise to the columns returned by the inner terms). The unprotect
is necessary to lift the function calls back to operating on the level of terms (instead of elements), since lag(t::Term, deg)
is what’s happening internally to create a LagTerm
.
That syntax is a bit clunky for this use case, and I’m honestly not sure it’d work with run-time creation. Another consequence of the overhaul is that you can actually construct FunctionTerm
s at run time, so you could MAYBE do something like FunctionTerm(+, [unprotect(lag(ts, i)) for i in 1:n]...)
.
In any case, could you comment on #183? It’s useful to have a record of when these sorts of features would be useful, and to stress-test the proposed implementation and syntax!
I’d say that the short-term solution would just be to create your own wrapper term type (like cumlag
or something) which generates the LagTerm
s and has a special method for modelcols
…I could mock something up for that later if it would be useful (feel free to ping), or you could refer to the poly
example in the docs, or to how lag
is handled.