In my research, I often work with the following kind of problem that essentially reduces to approximating a conditional expectation, say \mathbb{E}[g(X)|X=x], by a linear combination of basis-functions, \mathbb{E}[g(X)|X=x] \approx \sum_{i} a_i \psi_i(x), where the coefficients a_i are e.g. estimated using ordinary least squares. To apply least-squares, we want to compute for a sample x^1,...,x^n the matrix \mathbf{X} = (\psi_i(x^n))_{i,n}, while for pseudo regression, we may wish to compute the matrix \Psi = (\mathbb{E}[\psi_i(X)\psi_j(X)])_{i,j}, etc.
So that raises the question: is there a package that allow you to generate some Basis object based on simple rules (such as: take polynomials in X_1 of degrees 1:d, take Laguerre polynomials in X_2 of degrees 1:d, and take all products such that the degree of the resulting polynomials are at most of degree d) and then constructs and computes the functions \psi_i (and possibly the matrix \Psi, if possible)? (This can then be seen as an algebra of basis-functions in the sense that we join together sets of basis-functions, take their products, etc.)
For my own research Iβve built quite a bit of code to do just this and although I have spent some time optimizing for speed, Iβm not using any GPU things and the like, while I think some kernel abstractions are possible here. Iβve also written API for various uses, e.g.
to automatically generate a Basis that consists of the orthonormal/orthogonal basis functions for a (product)distribution that you specify
to check that a certain subset of the Basis satisfies a certain condition on their moments
to split the basis to compute only the basis-functions corresponding to certain variables
If there already is a package that does this (ideally with high performance), Iβd like to check if I can adapt my API for that and use it in my projects (and perhaps contribute my code there). If there is not yet a package for this, Iβd like to consider making my code into a stand-alone package once my PhD-schedule quiets down a bit.
The problem is with the sampling. Normal least-square systems were decided with a finite data point. Maybe the function properties offer a nice solution. Should you sample points or do some fancier calculus tricks to get a more accurate least-square with less cost? That depends on a lot of factors like the function chosen. Numerical integration on its own, the art of sampling points and perhaps doing other fancier tricks to get a descriptive picture of a fixed function, let alone fitting the functions with a least-square. I think that there would probably be some nicely constructed functions where you could perform a nice integration and then perform a minimization on the integral itself instead of having to sample. Iβm not that good at calculus though.
It depends on whether or not you want a quick βgood enoughβ solution or you want the best and so on.
I see your point. My question is actually concerned less with the regression part (which serves more as the motivation for wanting this) and more with the algebra of basis-functions part of my post: just like we can make fancy models by composing parts in MTK and nice plots using AlgebraOfGraphics, I was wondering if there is a way to make complicated sets of basis functions from easier ones (and then, if you have a unified API for this, you can think about fitting OLS (simply sampling points and regressing), pseudo-regression (computing the necessary expectations, if possible), or other things).
Thanks, I didnβt think about that yet; so you could do all the logic for generating a set of functions (adding sets of functions together, adding cross-products, etc.) on the FormulaTerms, and then parse things along the FormulaTerm to do fancy things (e.g. compute expectations when the data follows a product-distribution, etc.).
Iβll keep this in mind and revisit this once I have time to check if this would support all the things that I would want to do and how I can write API for those things!
You might want to take a look at ApproxFun.jl if you havenβt already. Not sure if it supports everything you need, but itβs the first-stop shop for working with function spaces via orthogonal bases (similar to chebfun.org for Matlab, whose thorough and pedagogical documentation you might find helpful even if youβre using Julia and ApproxFun).
Thank you, that is a good pointer. The way I understand it is that you should provide it with a function already, which is kind of the problem in my case: Iβm interested in estimating a conditional expectation by sampled points, so I want to be able to do some form of regression, such as least-squares, and to be able to go up to higher dimensions. For my typical use-case Iβm looking at roughly 6 dimensions and in the order of 100-1000 basis-functions. (I know this is not ideal, but the algorithms NEED basis functions with certain properties for this, so fancier approximation methods are off the table for me, unfortunately)
If youβre doing least-squares, ApproxFun.jl can help build the Vandermonde matrix for your orthogonal function basis. See Frequently Asked Questions Β· ApproxFun.jl (the example starts with interpolation, but progresses to regression by the third code block and multivariate regression by the fourth).