I’m working on a project trying to use @formula
in a fairly novel way, for example, if I wanted to use GLM.jl
to fit a linear model with two predictors and their interaction I would use something like this:
f = @formula(y ~ 1 + x_1*x_2)
model = glm(f, data, Normal(), IdentityLink())
Assuming I have a vector β
prespecified, what I would like to do is be able to extract the regression equation out of f
so that I could get a function that behaved like:
function f!(y::T, x_1::T, x_2::T, β::Vector{T}) where {T<:Real}
y = β[1] + β[2] * x_1 + β[3] * x_2 + β[4] * x_1 * x_2
return y
end
I need to do this so that I can have a function in terms of the predictors and the regression coefficients β
that I can pass to automatic differentiation tools like ForwardDiff
. I’m not even sure if this is possible using StatsModels
and Terms 2.0. If anyone had any idea about how to implement this I would be excited to see!
2 Likes
That’s actually basically what the predict
function does, except that it’s not in place, and takes a table/row instead of positional arguments. If you want to do that then you can do something like
function f_factory(f::FormulaTerm)
function f!(y, xs::NamedTuple, β)
y[] = dot(modelcols(f, xs), β)
end
end
(note that if you want to make it “in place” you need to pass a Ref(y)
or something, you can’t assign to a Real
in place)
The specific thing you’re asking for is harder, but you can do it (with some restrictions) by “re-packing” the x values into a named tuple, something like
function f_factory(f::FormulaTerm)
function f!(y, xs::NamedTuple, β)
y[] = dot(modelcols(f, xs), β)
end
names = (StatsModels.termvars(f.rhs)..., )
f!(y, xs, β) = f!(y, NamedTuple{names}((xs..., )), β)
end
However, this will only work with continuous values, because categorical variables get converted into some kind of numeric form that can’t just be stuffed back into a named tuple and run through modelcols
again.
I should add, both of those solutions assume you can take the xs
as a tuple or a vector or something. If you really need the variadic form then you could do something like
function f_factory(f::FormulaTerm)
function f!(y, xs::NamedTuple, β)
y[] = dot(modelcols(f, xs), β)
end
names = (StatsModels.termvars(f.rhs)..., )
f!(y, β::Vector, xs...) = f!(y, NamedTuple{names}((xs..., )), β)
end
and I fiiiinnnally read the last bit of your post and saw that you want this to use with autodiff and so none of these solutions are likely to work :-/ Although I suppose there’s no harm in trying.
The other thing I’d suggest is to write a loop inside the closure that iterates over the terms and accumulates the results