How to predict X given a set of Ys in Julia?

Hello,
I have a vector V of y values describing a curve for a series of time points x; I have a threshold level Yt. How can I predict the exact x value for Yt?
Essentially: how do I interpolate Y to find X in Julia?
I can see there is a specific package (interpolations.jl) but requires the creation of a matrix first. IS there a simpler method (without the requirement of a matrix) that works on curves (rather than linear data)?
Thank you

V = [100.0, 118.68686748100414, 197.53957741968554, 405.39743832303634, 909.384506158371, 2298.8440311204035, 6352.7577141124375, 19137.261442828225, 61473.086836666625, 204601.00673831577, 662816.455857954, 1.3996572716346516e6, 1.9351053374287586e6, 2.0969909161631712e6, 2.1526434991115257e6, 2.1632872818975276e6, 2.1650948890150622e6, 2.1652845329468683e6, 2.165299329101924e6, 2.165299308178196e6, 2.165300516603722e6, 2.165289152377223e6]
Yt = 500000

Do you want linear interpolation between points, a polynomial fit or something else?

If you want linear interpolation, it looks like Interpolations.jl does exactly that, what’s your issue with it?

I have a sigmoid, so perhaps a polynomial?

Generally, which function you want to fit is a choice you will have to make, and motivate, yourself. There’s an infinite number of functions which fit any finite set of points. For discussion’s sake, say you choose model(x, p) = p[1] .+ p[2]*atan.(p[3]*x .+ p[4]). You can then say

using LsqFit
fit = curve_fit(model, x, y, [0.0, 1.0, 1.0, 0.0])
p = fit.param

to get the best choice of p. For your query, you want the inverse, but that’s just a question of massaging the model equation with some simple arithmetic (noticing that the inverse of atan is tan). You can use the same parameters p.

2 Likes

Complete example:

using Plots, LsqFit
default(;fontfamily="Computer Modern", label="") # Vanity settings
V = [100.0, 118.68686748100414, 197.53957741968554, 405.39743832303634, 909.384506158371, 2298.8440311204035, 6352.7577141124375, 19137.261442828225, 61473.086836666625, 204601.00673831577, 662816.455857954, 1.3996572716346516e6, 1.9351053374287586e6, 2.0969909161631712e6, 2.1526434991115257e6, 2.1632872818975276e6, 2.1650948890150622e6, 2.1652845329468683e6, 2.165299329101924e6, 2.165299308178196e6, 2.165300516603722e6, 2.165289152377223e6]
V_t = 500_000
t = range(0, 60; length=length(V))
model(x, p) = 1e6*(p[1] .+ p[2]*atan.(p[3]*x .+ p[4])) # Scaling by 1e6 to make more sense numerically
fit = curve_fit(model, t, V, [0.0, 1.0, 1.0, 0.0])
p = fit.param
t_t = (tan((V_t/1e6 - p[1])/p[2])-p[4])/p[3]
plot(t, V; st=:scatter)
plot!(x->model(x, p))
plot!([t_t], [V_t]; st=:scatter)

image

6 Likes

Hi @Luigi_Marongiu

Interpolations.jl allows for SPlines, you don’t need “the creation of a matrix” and “works on curves”

using Plots

x = 1:0.5:5
y = 1 ./ (1 .+ exp.(-x))
scatter(x,y)

using Interpolations
bsi=Interpolations.scale(interpolate(y,BSpline(Cubic(Line(OnGrid())))),x)

x = 1:0.1:5
plot!(x,bsi.(x))

image

7 Likes

Thank you. I tried with a Michaelis-Menten (more biologically relevant) but it does not fit:

x = collect(0:2.8:60) 
y = V
par = [0.0, 1.0]
model = michaelis_menten(x, p) = (x .* p[1]) ./ (x .+ p[2])
fit = curve_fit(model, x, y, par)
P = fit.param
Y = michaelis_menten(x, P)

I am trying with a logistic but it also does not give good results:

model = logistic(x, p) = ((p[1].*x) .* (1 .- (x./p[2]))) 
fit = curve_fit(model, x, y, par)

With a ‘real’ logistic, I get a bit better but still a lot off:

par = [10^7, 0.0, 1.0] # par = [0.0, 1.0]
model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3])))
fit = curve_fit(model, x, y, par)

How can I improve the regression with these functions?

So I’m not intimately familiar with the michaelis-menten function, but it might be worth considering adding more parameters (for instance amplitude, width, x-shift, y-shift). And try to rescale the data so the parameters are within a couple of orders of magnitude (like I did with the 1e6 above).

Plotting with linear axes will probably make the behavior more intuitively obvious.

It seem like having zeroes in your model’s parameters might not make much sense since, for instance, for michaelis_menten you would only have zeros as a result. In any case, The plots you show do not correspond to the curve_fit results either, see below changing 0.0 to 0.2 in param:

using Plots
using LsqFit

x = collect(0:2.8:60) 

param = [0.2, 1.0]
model = michaelis_menten(x, p) = (x .* p[1]) ./ (x .+ p[2])
fit = curve_fit(model, x, model(x,param), param)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

image

model = logistic(x, p) = ((p[1].*x) .* (1 .- (x./p[2]))) 
fit = curve_fit(model, x, model(x,par), par)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

image

param = [10^7, 0.2, 1.0]
model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3])))
fit = curve_fit(model, x, model(x,param), param)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

image

I think the problem was in the X data, now I got this fitting:


the logistic model with exponents gives me the best fitting. I’ll try now to interpolate.
Tx

Hello,
assuming the fitting phase is done (I have a function with fitted parameters that represent the observed data), how do I run the interpolation? From Interpolations, it looks that (a) the main process is linear (whereas the curve here is not linear by definition), (b) it requires a matrix (whereas I have a single value to interpolate: 500 000), (c) curved interpolation already has its functions (Constant, Linear, Quadratic, and Cubic) thus there was a need to do the fitting part, (d) there is no space for the function I defined in the fitting phase, for instance the object itp = interpolate(a, BSpline(Constant())) takes a as the value to be interpolated, and provides already the function for the task (BSpline).
How can I find the X value for Y=500000 given the function true_logistic I defined and the fitted parameters I have found:

3-element Vector{Float64}:
  2.1647983679173873e6
  0.7211511621759724
 13.214716462754794

Thank you

If you have a fitted function, you don’t need Interpolations. Just do

model(x, p) = ...
fit = curve_fit(model, x_data, y_data, p_0)
p = fit.param

new_y = model(new_x, p)
new_ys = model.(new_xs, Ref(p))

At least in physics, curve fitting is strictly better than naive interpolation, because you leverage the knowledge you have about the system to find the most probable intermediate values. And you can get estimates for the quality of fit, and confidence bounds for your predictions.

4 Likes

In this case, though, I need:

new_x = model(new_y, p)

if I put the new_y as the level of 500 000 that I want, with the fitted p, I get two million, that is obviously wrong. How do I get X given Y?
Thanks

can you try using this parametrization? to see what comes out?

f1(x,p) = p[1] ./ (p[2] .* x .^ p[3] .- 1.0)
param = [2.165e6, 10.0, -2.0]

also, those points look a lot like bode plots for amplitude of response systems

I get:

julia> f1(x,p) = p[1] ./ (p[2] .* x .^ p[3] .- 1.0)
f1 (generic function with 1 method)
julia> param = [2.165e6, 10.0, -2.0]
3-element Vector{Float64}:
  2.165e6
 10.0
 -2.0
julia> fit_x = f1(500000, param)
-2.1650000000866e6

Does it mean that to get X given Y (Y->X) I need the inverse function and then run X->Y?

Yes, given a function f : x, p -> y, you’ll need to either

  1. Fit f to your x and y data (i.e. find appropriate p), then find the inverse g : y, p -> x and use those p to find x_n given y_n,
  2. Find g, then fit it to your y and x data to find x_n given y_n, or
  3. Fit f to your x and y data, then use some numerical solving method to find x_n given y_n

The two former methods should be roughly equivalent, and in my opinion superior to the last one as long as f is (easilly) invertible (always do analytics as far as you can, involving numerics only at the last step). I think I would go with option 1, because it’s more immediately intuitive.

I used symbolab to find the inverse function of the logistic and followed option 1:

julia> model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3]))) # set regression
true_logistic (generic function with 1 method)
julia> inverse_logistic(x, p) = ( p[1] .* p[2] .* exp.(-p[2] .* (x .- p[3])) ) ./ ( (1 .+ exp.(-p[2] .* (x .- p[3]))^2 ) 
inverse_logistic (generic function with 1 method)
julia> fit = curve_fit(model, x, y, par)
LsqFit.LsqFitResult{Vector{Float64}, Vector{Float64}, Matrix{Float64}, Vector{Float64}}([2.1647983679173873e6, 0.7211511621759724, 13.214716462754794], [57.26888702184834, 65.57394242932473, 97.69294768592027, 169.4743733818491, 308.55542081798865, 593.6144246584468, 1148.52487052776, 2116.5614422187937, 3246.5892025041103, 2642.0985198940616  …  2621.5980348929297, 1631.373413761612, 194.25900849187747, -341.1922586141154, -480.2387717682868, -499.5315029202029, -501.42603116389364, -500.9472773480229, -502.1486880327575, -490.7844598405063], [7.264828417848866e-5 -2078.1127703397933 -113.40640133348667; 8.51168462811274e-5 -2394.275920789786 -132.86858696614163; … ; 1.0000000000034532 7.689947814299762e-5 0.0; 1.0000000000034532 0.0 0.0], true, Float64[])
julia> P = fit.param
3-element Vector{Float64}:
  2.1647983679173873e6
  0.7211511621759724
 13.214716462754794
julia> fit_x = inverse_logistic(Xp, P)
0.0

but the expected value should be a bit more than 11…

Just as a sanity check, try

plot(x,y;st=:scatter)
plot!(x,true_logistic.(x,Ref(P)))
plot!(inverse_logistic.(y,Ref(P)),y)

These should all look about the same. Without looking too close into your inverse function, it looks a bit strange that p[1] is in the numerator of both functions, I would expect it to be in the delimiter for the inverse function (I.e. if the “amplitude” of f is large, the “amplitude” of f^{-1} should be small).

Having looked a bit closer at your function, i.e. manually solving y = a/(1+exp(-b(x-c))) for x, the inverse function should be

g(y,p) = p[3] - log(p[1]/y - 1)/p[2]

No idea what the inverse calculator was doing.

1 Like

interesting! I used the formula given by Wikipedia:
9e26947596d387d045be3baeb72c11270a065665

and I changed L=a, k=b, x₀=c, thus:
Screenshot from 2021-07-28 14-59-33
I’ll try your inverse, thank you!