Extending the `@formula` syntax to generate functions

I’m working on a project trying to use @formula in a fairly novel way, for example, if I wanted to use GLM.jl to fit a linear model with two predictors and their interaction I would use something like this:

f = @formula(y ~ 1 + x_1*x_2)
model = glm(f, data, Normal(), IdentityLink())

Assuming I have a vector β prespecified, what I would like to do is be able to extract the regression equation out of f so that I could get a function that behaved like:

function f!(y::T, x_1::T, x_2::T, β::Vector{T}) where {T<:Real}
    y = β[1] + β[2] * x_1 + β[3] * x_2 + β[4] * x_1 * x_2
    return y
end

I need to do this so that I can have a function in terms of the predictors and the regression coefficients β that I can pass to automatic differentiation tools like ForwardDiff. I’m not even sure if this is possible using StatsModels and Terms 2.0. If anyone had any idea about how to implement this I would be excited to see!

2 Likes

That’s actually basically what the predict function does, except that it’s not in place, and takes a table/row instead of positional arguments. If you want to do that then you can do something like

function f_factory(f::FormulaTerm)
    function f!(y, xs::NamedTuple, β)
        y[] = dot(modelcols(f, xs), β)
    end
end

(note that if you want to make it “in place” you need to pass a Ref(y) or something, you can’t assign to a Real in place)

The specific thing you’re asking for is harder, but you can do it (with some restrictions) by “re-packing” the x values into a named tuple, something like

function f_factory(f::FormulaTerm)

    function f!(y, xs::NamedTuple, β)
        y[] = dot(modelcols(f, xs), β)
    end

    names = (StatsModels.termvars(f.rhs)..., )
    f!(y, xs, β) = f!(y, NamedTuple{names}((xs..., )), β)

end

However, this will only work with continuous values, because categorical variables get converted into some kind of numeric form that can’t just be stuffed back into a named tuple and run through modelcols again.

I should add, both of those solutions assume you can take the xs as a tuple or a vector or something. If you really need the variadic form then you could do something like

function f_factory(f::FormulaTerm)

    function f!(y, xs::NamedTuple, β)
        y[] = dot(modelcols(f, xs), β)
    end

    names = (StatsModels.termvars(f.rhs)..., )
    f!(y, β::Vector, xs...) = f!(y, NamedTuple{names}((xs..., )), β)

end

and I fiiiinnnally read the last bit of your post and saw that you want this to use with autodiff and so none of these solutions are likely to work :-/ Although I suppose there’s no harm in trying.

The other thing I’d suggest is to write a loop inside the closure that iterates over the terms and accumulates the results