For example, I want to build a multivariate polynomial model for data \{x_{1i},x_{2i},\cdots,x_{mi},y_i\}_{i=1}^N with given degree n.
Thanks for your attention!
For example, I want to build a multivariate polynomial model for data \{x_{1i},x_{2i},\cdots,x_{mi},y_i\}_{i=1}^N with given degree n.
Thanks for your attention!
GLM.jl and it’s formulas should get you quite far:
using GLM
vars = [:x1, :x2, :x3, :x4]
n = 2
lhs = filter(t -> StatsModels.degree(t) <= n, reduce(*, Term.(vars)))
rhs = Term(:y)
poly = FormulaTerm(rhs, lhs)
# Fit your model
glm(poly, data, Normal())
Please also see the StatsModels.jl documentation if the construction of the formula is unclear.
vars = [:x1, :x2, :x3, :x4]
In your code the vars are defined one by one. How can I do this if I want to define a larger number m variables?
You could either get them from your data directly
using DataFrames
df = DataFrame(x1 = rand(10), x2 = rand(10), x3 = rand(10), x4 = rand(10), y = rand(10))
vars = setdiff(Symbol.(names(df)), (:y, )) # Don't forget to remove target column
or just create them programmatically
vars = [Symbol("x" * string(i)) for i in 1:10]
PS: If your number of variables is large, you probably need to adapt the above approach to only create the terms up to degree n in the first place, i.e., currently it creates all interactions up to order m (of which there are 2^{m}-1) and then filters out the ones with degree > n. This will not work if m is about ten or larger – on the other hand, polynomial regression does not make sense in that case as well.
Thank you very much!
Actually, I implemented the polynomial regression function I expected in a more complex way, and was preparing to work on the polynomial regression problem for m=25, when I saw you say
By the way, when you say polynomial regression does not make sense in that case as well, do you mean that the accuracy is not reliable or that the computational complexity is too great?
There is too much collinearity between the regressors, e.g. between x^6 and x^7. You need something like an orthogonal polynomial basis system to do this. But even this is generally not worth doing. Higher dimensional polynomial regression is seldom useful.
chebregression
from FastChebInterp.jl may also be worth trying.
can be
using Combinatorics
lhs = sum(sum(map(splat(&), subsets(Term.(vars),d))) for d in 1:n)
Thanks, that’s a very neat one-liner creating only the desired interactions (instead of filtering them out of a much larger set as my code did). @WuSiren This code solves the performance problem I mentioned and works also when m is large, but n ≪ m.
FastChebInterp does exactly this — it does multi-variate polynomial regression in the Chebyshev-polynomial basis, so it avoids the ill-conditioning that arises for the monomial x^n basis.
It’s very useful for smooth functions in a few dimensions, but in more than 4–5 dimensions you run into the curse of dimensionality quickly. A variant that sometimes works better in higher dimensions is to do something based on sparse grids, and there are various packages in Julia for that. You can also try Surrogates.jl for various other kinds of regression bases, e.g. radial basis functions.
CoupledFields.jl can also do high dimensional polynomial regression. It’s based on kernel ridge regression, and finds the directions of the active subspace via a kernel matrix. Both X and Y can be multivariable.
In 25 dimensions, you really need to think more carefully about what you are doing.
Even with just linear (degree-1 in each variable) multivariate models, this has 2^{25} (≈ 34 million) parameters. Do you have hundreds of millions of data points? For degree 2 (quadratic in each variable) models you will have 3^{25} ≈ 850 billion parameters, so you will need trillions (or more) data points. (Even if you do have that much data, solving for the fit parameters is then a supercomputer-scale system of linear equations.)
Unless you mean polynomial models where each term has at most n
variables? That is much more manageable. e.g. if each term has at most 2 variables (and at most one of each) then you only have \begin{pmatrix} 25 \\ 0 \end{pmatrix} + \begin{pmatrix} 25 \\ 1 \end{pmatrix} + \begin{pmatrix} 25 \\ 2 \end{pmatrix}= 326 unknowns. (If you allow the same variable to appear more than once in each term, you get a slightly larger number 351: n-choose-k becomes n-choose-k with replacement.) But of course, by doing so you have greatly restricted your polynomials (in fact, it’s somewhat similar in spirit to sparse grids: you’ve restricted the mixed derivatives).
There are lots of high-dimensional regression models (from sparse grids to kernel methods to neural nets) that try to avoid the curse of dimensionality. But they all trade off some generality, and go beyond simple polynomial regression.
Maybe I’m misunderstanding the code here, but doesn’t this produce a single term? reduce(*, Term.(vars))
is equivalent to prod(Term, vars)
, no? If you wanted to go this route, wouldn’t you use Combinatorics.jl or similar to get all possible combinations of terms?
Thank you, @stevengj , for your patient and professional answers!
In fact, I was referring to the second case you mentioned, which is essentially a least square problem with 351 parameters. I’m sorry I didn’t make it clear at the beginning.
You mentioned kernel methods and neural networks, and I am wondering if least square regression based on polynomial function mapping can also be considered a kernel method? Or can it be seen as a special kind of neural network? It would be nice to have some kind of literature on how they relate to each other.
Well, *
on terms does expand into all combinations as the formula y ~ x1 * x2 * x3
means all interactions up to that order in R. Thus,
julia> @formula y ~ x1 * x2 * x3
FormulaTerm
Response:
y(unknown)
Predictors:
x1(unknown)
x2(unknown)
x3(unknown)
x1(unknown) & x2(unknown)
x1(unknown) & x3(unknown)
x2(unknown) & x3(unknown)
x1(unknown) & x2(unknown) & x3(unknown)
julia> vars = [:x1, :x2, :x3];
julia> prod(Term, vars)
x1(unknown)
x2(unknown)
x1(unknown) & x2(unknown)
x3(unknown)
x1(unknown) & x3(unknown)
x2(unknown) & x3(unknown)
x1(unknown) & x2(unknown) & x3(unknown)
# This is the interaction between x1, x2 and x3 alone (written as x1 : x2 : x3 in R formulas)
julia> reduce(&, Term.(vars))
x1(unknown) & x2(unknown) & x3(unknown)
Ah, thanks — I didn’t realize that *
meant something different from multiplication here.
Yes, polynomial regression can be considered as a kernel method. The kernel is non-stationary with a variance that quickly grows in |\mathbf{x}| and a specific weighting of each interaction order (for the glorious details see chapter 4.2.2 of GPML).
Shallow neural networks give rise to a different kernel which is also non-stationary, but becomes constant for large |\mathbf{x}| (chapter 4.2.3 of the same book).
The syntax is a DSL called Wilkinson-Roger notation, see also Use some character other than * for "main effects and interactions" · Issue #99 · JuliaStats/StatsModels.jl · GitHub for more discussion
Thanks a lot!!!