# Creating an interpolating 2D lookup table

I have the function p=f(a, b) and can generate a grid of values for all valid values of a and b using a simulation. The values of p are steadily increasing with a and steadily decreasing with b.

I need a numeric approximation for b=f(a, p).

How would you do that?

If your f(a,b) is a smooth function, I would evaluate it on a Chebyshev grid and then do Chebyshev interpolation, since that will converge exponentially rapidly with the number of grid points. e.g. you can use FastChebInterp.jl.

Once you have an interpolant for p = f(a,b), then if you want to invert it in the second argument to find b(a,p) you can do so very quickly (e.g. by Newton’s method), and better yet you can do so on a Chebyshev grid of (a,p) points and form a new Chebyshev interpolant for b(a,p).

6 Likes

Thanks, I will try! OK, the 1D interpolation is working. I have my points in a dataframe:

julia> df
7×3 DataFrame
Row │ Γ0       Γests           Eps
│ Float64  Any             Array…
─────┼────────────────────────────────────────────────────────────
1 │   1.0    0.85:0.025:1.0  [-6895.08, -5862.32, -4767.1, -3…
2 │   0.975  0.85:0.025:1.0  [-5891.23, -4795.81, -3639.82, -…
⋮  │    ⋮           ⋮                         ⋮
6 │   0.875  0.85:0.025:1.0  [-1243.74, 3.61433e-12, 1177.96,…
7 │   0.85   0.85:0.025:1.0  [1.13868e-10, 1184.64, 2260.65, …


This works:

            Γests = df.Γests
Eps   = df.Eps
c_ep = chebregression(Γests, Eps, minimum(Γests), maximum(Γests)+eps(), length(Γests)-1)


But I need a 2D regression. I can stack the vectors of the second and third column of the data frame to get 2D arrays. But how do I have to use the first column?
I mean, I try to get an interpolation for: E_p=f(\Gamma, \Gamma_{est}), so I must use the first column somehow…

If you can choose the points, don’t use chebregression. Evaluate at Chebyshev points and use chebinterp.

Good point, but this does not answer my question…

See the examples in the FastChebInterp manual. If you simply evaluate your function at the Chebyshev points, then you only need to supply a single 2d array of function values at those points to chebinterp. I have no opinion on how you store them in the dataframe, but assuming you store them in a column the correct order then you will only need a reshape on that column to get the function values in the order chebinterp expects.

I solved my problem. Given the data frame:

julia> df
7×3 DataFrame
Row │ Γ0       Γests           Eps
│ Float64  Any             Array…
─────┼────────────────────────────────────────────────────────────
1 │   1.0    0.85:0.025:1.0  [-6895.08, -5862.32, -4767.1, -3…
2 │   0.975  0.85:0.025:1.0  [-5891.23, -4795.81, -3639.82, -…
⋮  │    ⋮           ⋮                         ⋮
6 │   0.875  0.85:0.025:1.0  [-1243.74, 3.61433e-12, 1177.96,…
7 │   0.85   0.85:0.025:1.0  [1.13868e-10, 1184.64, 2260.65, …


The following code can be used to create the 2D approximation function:

using FastChebInterp, DataFrames
order = Int64(floor(2*sqrt(length(df.Γests))))
println("Max order for equidistant sampling: \$(order)")
X = Vector{Float64}[]
Y = Float64[]
for i in 1:length(df.Γ0)
for j in 1:length(df.Γests[i])
vec = [df.Γ0[i], df.Γests[i][j]]
push!(X, vec)
push!(Y, df.Eps[i][j])
end
end
c_ep = chebregression(X, Y, (order, order))


My problem was to understand how to build the X and Y vectors, but now I understand it.

If you do not use Chebyshev points or nodes there is the risk that you get weird oscillations, but only if the order of your approximation is too high.

Generally, when using m equidistant points, if N<2 \sqrt{m} then the least squares
approximation P_N(x) is well-conditioned. (Runge's phenomenon - Wikipedia)

Also worth reading: Chebyshev nodes - Wikipedia

For a polynom of the order 5x5 I would need only 36 functions calls when using Chebyshev points, but I need 49 calls with an equidistant grid. I might still implement this, but first I had to make the conversion from my data frame to the X and Y vectors work…

I use this data frame because it makes it easy to plot the data, which represents simulation results with a set of lines.