How to predict X given a set of Ys in Julia?

Luigi_Marongiu · July 16, 2021, 12:41pm

Hello,
I have a vector V of y values describing a curve for a series of time points x; I have a threshold level Yt. How can I predict the exact x value for Yt?
Essentially: how do I interpolate Y to find X in Julia?
I can see there is a specific package (interpolations.jl) but requires the creation of a matrix first. IS there a simpler method (without the requirement of a matrix) that works on curves (rather than linear data)?
Thank you

V = [100.0, 118.68686748100414, 197.53957741968554, 405.39743832303634, 909.384506158371, 2298.8440311204035, 6352.7577141124375, 19137.261442828225, 61473.086836666625, 204601.00673831577, 662816.455857954, 1.3996572716346516e6, 1.9351053374287586e6, 2.0969909161631712e6, 2.1526434991115257e6, 2.1632872818975276e6, 2.1650948890150622e6, 2.1652845329468683e6, 2.165299329101924e6, 2.165299308178196e6, 2.165300516603722e6, 2.165289152377223e6]
Yt = 500000

gustaphe · July 16, 2021, 1:06pm

Do you want linear interpolation between points, a polynomial fit or something else?

If you want linear interpolation, it looks like Interpolations.jl does exactly that, what’s your issue with it?

Luigi_Marongiu · July 16, 2021, 1:08pm

I have a sigmoid, so perhaps a polynomial?

gustaphe · July 16, 2021, 1:35pm

Generally, which function you want to fit is a choice you will have to make, and motivate, yourself. There’s an infinite number of functions which fit any finite set of points. For discussion’s sake, say you choose model(x, p) = p[1] .+ p[2]*atan.(p[3]*x .+ p[4]). You can then say

using LsqFit
fit = curve_fit(model, x, y, [0.0, 1.0, 1.0, 0.0])
p = fit.param

to get the best choice of p. For your query, you want the inverse, but that’s just a question of massaging the model equation with some simple arithmetic (noticing that the inverse of atan is tan). You can use the same parameters p.

gustaphe · July 16, 2021, 1:58pm

Complete example:

using Plots, LsqFit
default(;fontfamily="Computer Modern", label="") # Vanity settings
V = [100.0, 118.68686748100414, 197.53957741968554, 405.39743832303634, 909.384506158371, 2298.8440311204035, 6352.7577141124375, 19137.261442828225, 61473.086836666625, 204601.00673831577, 662816.455857954, 1.3996572716346516e6, 1.9351053374287586e6, 2.0969909161631712e6, 2.1526434991115257e6, 2.1632872818975276e6, 2.1650948890150622e6, 2.1652845329468683e6, 2.165299329101924e6, 2.165299308178196e6, 2.165300516603722e6, 2.165289152377223e6]
V_t = 500_000
t = range(0, 60; length=length(V))
model(x, p) = 1e6*(p[1] .+ p[2]*atan.(p[3]*x .+ p[4])) # Scaling by 1e6 to make more sense numerically
fit = curve_fit(model, t, V, [0.0, 1.0, 1.0, 0.0])
p = fit.param
t_t = (tan((V_t/1e6 - p[1])/p[2])-p[4])/p[3]
plot(t, V; st=:scatter)
plot!(x->model(x, p))
plot!([t_t], [V_t]; st=:scatter)

viraltux · July 16, 2021, 2:05pm

Hi @Luigi_Marongiu

Interpolations.jl allows for SPlines, you don’t need “the creation of a matrix” and “works on curves”

using Plots

x = 1:0.5:5
y = 1 ./ (1 .+ exp.(-x))
scatter(x,y)

using Interpolations
bsi=Interpolations.scale(interpolate(y,BSpline(Cubic(Line(OnGrid())))),x)

x = 1:0.1:5
plot!(x,bsi.(x))

Luigi_Marongiu · July 17, 2021, 2:37pm

Thank you. I tried with a Michaelis-Menten (more biologically relevant) but it does not fit:

x = collect(0:2.8:60) 
y = V
par = [0.0, 1.0]
model = michaelis_menten(x, p) = (x .* p[1]) ./ (x .+ p[2])
fit = curve_fit(model, x, y, par)
P = fit.param
Y = michaelis_menten(x, P)

I am trying with a logistic but it also does not give good results:

model = logistic(x, p) = ((p[1].*x) .* (1 .- (x./p[2]))) 
fit = curve_fit(model, x, y, par)

With a ‘real’ logistic, I get a bit better but still a lot off:

par = [10^7, 0.0, 1.0] # par = [0.0, 1.0]
model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3])))
fit = curve_fit(model, x, y, par)

How can I improve the regression with these functions?

gustaphe · July 17, 2021, 3:26pm

So I’m not intimately familiar with the michaelis-menten function, but it might be worth considering adding more parameters (for instance amplitude, width, x-shift, y-shift). And try to rescale the data so the parameters are within a couple of orders of magnitude (like I did with the 1e6 above).

Plotting with linear axes will probably make the behavior more intuitively obvious.

viraltux · July 18, 2021, 7:30am

It seem like having zeroes in your model’s parameters might not make much sense since, for instance, for michaelis_menten you would only have zeros as a result. In any case, The plots you show do not correspond to the curve_fit results either, see below changing 0.0 to 0.2 in param:

using Plots
using LsqFit

x = collect(0:2.8:60) 

param = [0.2, 1.0]
model = michaelis_menten(x, p) = (x .* p[1]) ./ (x .+ p[2])
fit = curve_fit(model, x, model(x,param), param)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

model = logistic(x, p) = ((p[1].*x) .* (1 .- (x./p[2]))) 
fit = curve_fit(model, x, model(x,par), par)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

param = [10^7, 0.2, 1.0]
model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3])))
fit = curve_fit(model, x, model(x,param), param)
scatter(x,model(x,param))
plot!(x,model(x,fit.param))

Luigi_Marongiu · July 18, 2021, 1:35pm

I think the problem was in the X data, now I got this fitting:

the logistic model with exponents gives me the best fitting. I’ll try now to interpolate.
Tx

Luigi_Marongiu · July 24, 2021, 2:26pm

Hello,
assuming the fitting phase is done (I have a function with fitted parameters that represent the observed data), how do I run the interpolation? From Interpolations, it looks that (a) the main process is linear (whereas the curve here is not linear by definition), (b) it requires a matrix (whereas I have a single value to interpolate: 500 000), (c) curved interpolation already has its functions (Constant, Linear, Quadratic, and Cubic) thus there was a need to do the fitting part, (d) there is no space for the function I defined in the fitting phase, for instance the object itp = interpolate(a, BSpline(Constant())) takes a as the value to be interpolated, and provides already the function for the task (BSpline).
How can I find the X value for Y=500000 given the function true_logistic I defined and the fitted parameters I have found:

3-element Vector{Float64}:
  2.1647983679173873e6
  0.7211511621759724
 13.214716462754794

Thank you

gustaphe · July 24, 2021, 4:04pm

If you have a fitted function, you don’t need Interpolations. Just do

model(x, p) = ...
fit = curve_fit(model, x_data, y_data, p_0)
p = fit.param

new_y = model(new_x, p)
new_ys = model.(new_xs, Ref(p))

At least in physics, curve fitting is strictly better than naive interpolation, because you leverage the knowledge you have about the system to find the most probable intermediate values. And you can get estimates for the quality of fit, and confidence bounds for your predictions.

Luigi_Marongiu · July 28, 2021, 7:52am

In this case, though, I need:

new_x = model(new_y, p)

if I put the new_y as the level of 500 000 that I want, with the fitted p, I get two million, that is obviously wrong. How do I get X given Y?
Thanks

longemen3000 · July 28, 2021, 8:19am

can you try using this parametrization? to see what comes out?

f1(x,p) = p[1] ./ (p[2] .* x .^ p[3] .- 1.0)
param = [2.165e6, 10.0, -2.0]

also, those points look a lot like bode plots for amplitude of response systems

Luigi_Marongiu · July 28, 2021, 9:14am

I get:

julia> f1(x,p) = p[1] ./ (p[2] .* x .^ p[3] .- 1.0)
f1 (generic function with 1 method)
julia> param = [2.165e6, 10.0, -2.0]
3-element Vector{Float64}:
  2.165e6
 10.0
 -2.0
julia> fit_x = f1(500000, param)
-2.1650000000866e6

Does it mean that to get X given Y (Y->X) I need the inverse function and then run X->Y?

gustaphe · July 28, 2021, 10:45am

Yes, given a function f : x, p -> y, you’ll need to either

Fit f to your x and y data (i.e. find appropriate p), then find the inverse g : y, p -> x and use those p to find x_n given y_n,
Find g, then fit it to your y and x data to find x_n given y_n, or
Fit f to your x and y data, then use some numerical solving method to find x_n given y_n

The two former methods should be roughly equivalent, and in my opinion superior to the last one as long as f is (easilly) invertible (always do analytics as far as you can, involving numerics only at the last step). I think I would go with option 1, because it’s more immediately intuitive.

Luigi_Marongiu · July 28, 2021, 11:17am

I used symbolab to find the inverse function of the logistic and followed option 1:

julia> model = true_logistic(x, p) = p[1] ./ (1 .+ exp.(-p[2].*(x.-p[3]))) # set regression
true_logistic (generic function with 1 method)
julia> inverse_logistic(x, p) = ( p[1] .* p[2] .* exp.(-p[2] .* (x .- p[3])) ) ./ ( (1 .+ exp.(-p[2] .* (x .- p[3]))^2 ) 
inverse_logistic (generic function with 1 method)
julia> fit = curve_fit(model, x, y, par)
LsqFit.LsqFitResult{Vector{Float64}, Vector{Float64}, Matrix{Float64}, Vector{Float64}}([2.1647983679173873e6, 0.7211511621759724, 13.214716462754794], [57.26888702184834, 65.57394242932473, 97.69294768592027, 169.4743733818491, 308.55542081798865, 593.6144246584468, 1148.52487052776, 2116.5614422187937, 3246.5892025041103, 2642.0985198940616  …  2621.5980348929297, 1631.373413761612, 194.25900849187747, -341.1922586141154, -480.2387717682868, -499.5315029202029, -501.42603116389364, -500.9472773480229, -502.1486880327575, -490.7844598405063], [7.264828417848866e-5 -2078.1127703397933 -113.40640133348667; 8.51168462811274e-5 -2394.275920789786 -132.86858696614163; … ; 1.0000000000034532 7.689947814299762e-5 0.0; 1.0000000000034532 0.0 0.0], true, Float64[])
julia> P = fit.param
3-element Vector{Float64}:
  2.1647983679173873e6
  0.7211511621759724
 13.214716462754794
julia> fit_x = inverse_logistic(Xp, P)
0.0

but the expected value should be a bit more than 11…

gustaphe · July 28, 2021, 11:53am

Just as a sanity check, try

plot(x,y;st=:scatter)
plot!(x,true_logistic.(x,Ref(P)))
plot!(inverse_logistic.(y,Ref(P)),y)

These should all look about the same. Without looking too close into your inverse function, it looks a bit strange that p[1] is in the numerator of both functions, I would expect it to be in the delimiter for the inverse function (I.e. if the “amplitude” of f is large, the “amplitude” of f^{-1} should be small).

gustaphe · July 28, 2021, 12:43pm

Having looked a bit closer at your function, i.e. manually solving y = a/(1+exp(-b(x-c))) for x, the inverse function should be

g(y,p) = p[3] - log(p[1]/y - 1)/p[2]

No idea what the inverse calculator was doing.

Luigi_Marongiu · July 28, 2021, 12:56pm

interesting! I used the formula given by Wikipedia:
9e26947596d387d045be3baeb72c11270a065665

and I changed L=a, k=b, x₀=c, thus:
Screenshot from 2021-07-28 14-59-33
I’ll try your inverse, thank you!

Topic		Replies	Views
Find intersection between curves in julia New to Julia roots , interpolations , splines	13	3264	October 28, 2021
Using Interpolations New to Julia interpolations	47	19312	October 5, 2020
Interpolation within ModelingToolkit framework New to Julia modelingtoolkit	16	2383	May 1, 2023
Fitting dose response curves General Usage	35	4654	August 7, 2018
Simple interpolation? General Usage interpolations	19	6201	August 15, 2022

How to predict X given a set of Ys in Julia?

Related topics