Basis functions to approximate the inverse of quadratic function

Yes, in fact I wrote a package that does precisely this — see the chebregression function in FastChebInterp.jl. Monomials are indeed a terrible basis at high degrees! But you still need to be careful fitting noisy data with high-degree polynomials, because overfitting can still happen even if the matrices are well-conditioned.

2 Likes

Indeed, definitely the case. Which is why I use them in Bayesian regression with strong priors on function behavior :smirk:. And that’s more or less where I was thinking of going in this problem but first trying to figure out what the full problem is. My impression is that maybe the OP wanted to estimate q(t) and planned to do it by first estimating a function g(p) to get noisy estimates of q and then estimate q by fitting those noisy estimates but I was thinking of going a different route. Perhaps @EricJohnson can clarify.

I thought about building a parametric model of q as a function of p and use Q to estimate the parameters.
I described that in fact I have a set of {Q}_{k} and {p}_{k}(t) so it will be a valid way to estimate tha parameters.

Do you also have the time information t_k for each observation?

So it’s just the model of q = g(p) which is of concern, not the function q(t) which is implicit?

I ask because if you want to know what happened in terms of q_i(t) you can do a Bayesian inference. Let q_i(t) be one of your time series of q, let a function be unknown f(x) = a+bx +cx^2 such that f(q_i(t))=p_i(t), and let p_i(t_k) be observed values of p_i at various times, and Q_i = \int_0^t q_i(t)dt

Describe q_i(t) in terms of a Chebyshev series with coefficients a_{i}[n].

You can place some prior distributions on a,b,c and the a_i[n] values, then your observed data is the p_i(t_k) observations and the Q_i observation, which you can place probability distributions for errors on.

Running the entire thing through a sampler, you will get samples of the a,b,c coefficients that make up the function f(q) and you will also get time-series for q_i(t). This is likely to be a LOT more stable than trying to take noisy measurements of p invert them through a function with an infinite derivative at 0 to get q, numerically integrate them to get Q and try to optimize that whole procedure to find a representation of the upper branch of the inverse of a+bx+cx^2

1 Like

(A sqrt singularity is well-conditioned — the relative condition number of \sqrt{x} is 1/2 for all x.)

Hmm. I’m thinking of this statement

On the basis of this, I’m imagining the p(t) values are measured with noise, since they couldn’t be less than 0 the noise is biased high, and the function is kind of like q = \sqrt{p}. The noise ensures that you’re never going to get q \sim 0 and yet you’re adding up a bunch of q values to get Q. If the true value of q can be near zero, you could easily have an estimate of Q which is essentially pure noise right?

As long as all the true values of q are away from 0 you’ll do ok, but if q is supposed to spend some time near 0 you’ll do very badly I’d think.

What do you mean?

This is interesting.
I don’t have experience with this approach.

  1. What prior should I use? I am not sure even I can define anything on the value.
  2. How can I describe q_i(t) as a Chebyshev series?

Could you guide me through this?
I’d be happy to give it a try.

Sure, there’s a really great package for Bayesian inference in Julia called Turing.jl, let’s use it as an example. Let’s assume we have just one time series, you can elaborate further on your own.

using Turing, LinearAlgebra, ApproxFun

@model function ejmodel(tvals,pvals,Qval,N)
   a ~ Normal(0,1000) 
   b ~ Normal(0,1000)
   c ~ Normal(0,1000)
   chebcoefs ~ MvNormal(zeros(N),1000 .* I(N))
   q = Fun(Chebyshev(0..1000),chebcoefs)
   Q = Integral(q)
   Qcalc = Q(maxt) - Q(mint)
   Qval ~ SomeDistribution(Qcalc, some_parameters)
   pvals ~ SomeOtherDistribution(map(x->a+b*x+c*x^2,[q(t) for t in tvals]), some_other_params)
   return((Qest = Qcalc))
end

Here we tell it that the a,b,c values are some numbers likely to be around 0 but wouldn’t be surprised if they were in the thousands. Similarly we are going to use an N term chebyshev series defined on the time interval 0…1000 (adjust as needed) so the chebcoefs are individually around 0 but could be in the thousands.

Then we calculate the indefinite integral of the q function. and we say that Qval and pvals should be “near” the values we get from calculating the definite integral, and from mapping the a+bx+cx^2 polynomial across the q(t) values from our series of times.

Note we can “do better” by adjusting the various priors, but with enough data it’s likely this doesn’t matter.

Also, you may say that SomeDistribution should be relatively sharply peaked around the prediction, so maybe Normal(Qcalc,0.1) would be appropriate for that distribution. On the other hand, the p values may require more thought in the choice of distribution based on what you know about your measurement errors.

Edit: I modified it so we keep the calculated integral value so you can later retrieve it for plotting etc.

1 Like

@dlakelan , If I get the model right:

  1. It is assumed the parameters are independent.
  2. They are modeled with high variance Gaussian. Probably almost flat in the neighborhood of the “correct” value.
  3. It is assumed there are many “samples” to infer from (So the prior weight is relatively low).

I wonder, under those assumption, wouldn’t the result be very similar to the approach of @stevengj?
Since basically his idea, by the Non Linear Least Squares is like the Maximum Likelihood Estimator.
So they collide, assuming the output will be the mode or close to it from the posterior.

Agreed the model as stated has little prior information included, however there is no reason we can’t modify that if we have the prior information. For example if the parabola is probably centered on q=0 we could use that information directly.

There is no reason we need to use Normal distributions for the error in p, in fact if the “data is very jumpy” as stated then perhaps we use some t distribution or something with skewness etc.

The posterior distribution may not be at all tight, in which case maximum likelihood estimates give a kind of overfitting. Yes you can do some confidence interval procedures but the Bayesian intervals are generally better in various ways.

There may be strong and nonlinear dependencies between the coefficients. The posterior samples will allow you to see this sort of thing easily.

Finally, the software to do the fit is easy to use, flexible, and lets you formulate ANY problem with any choice of distributions etc and you can do maximum a posteriori estimation or sampling. The tools can be used in a wide variety of ways and may be worthwhile to learn for other problems as well.

There’s nothing wrong with doing least squares if it suffices for your problem, but a Bayesian model is kind of the ultimate generalization of that and may offer advantages.

1 Like

I totally agree.
I really like Bayesian based optimization.

I wanted to point out the magic happens with the correct priors.
I am not sure the information provided here gives us the answer.

Agreed, though also the magic happens with the likelihoods as well. “Least squares” corresponds to normal distributions for the errors and no prior. Sometimes we have information that data distributions should be non-normal, even very non-normal, like exponential distributions or gamma distributions or t distributions or cauchy or whatever. The general statement is the magic happens when Bayes gives you a way to utilize specific information you have to get a good / better answer.

1 Like