 # Estimating parameters of a function for curve fitting

I have some data that has missing values. I am trying to fit a curve on this data, using Richard’s Model which is

\begin{aligned} I(t)=& \frac{K}{\left[1+e^{-r a(t-\tau)}\right] \frac{1}{a}} \end{aligned}

where K is fixed and I am trying to estimate a, \tau, and r. So for any time t, I want to be able to compute I(t) and it should be close to my data. I do have missing points in my time series.

I know how to do this in R, with the following code (here \tau is replaced with b.

######
richards.model <- function(prm, K, x) {
a = prm[['a']]
b = prm[['b']]
r = prm[['r']]
denom <- (1+exp(-r*a*(x-b)))^(1/a)
res <- K / denom
return(res)
}

errfct <- function(prm,K,dat) {
xx <- dat$t yy <- richards.model(prm , K=K, x=xx) return( sum( (yy-dat$cum.cases)^2, na.rm = T))
}

## compute on all data points
K <- 5500
pfit <- optim(par = list(a=2,b=80,r=0.1),
fn = errfct, dat = dat, K=K)


So the optim function calls the errfct with an estimate of parameters. The errfct function simply calculates I(t) with the given parameters, and calculates the sum of square errors. I am assuming the optim function is trying to minimize the least squares error.

Is there an equivalent package in Julia?

You might try Optim.jl.

1 Like

there is LsqFit.jl for just fitting of non linear functions to data.

using Pkg #package manager
using LsqFit #including LsqFit in your enviroment
#p = [a,tau,r] #vector, this is the order of parameters im gonna use
@. model(t,p,K) = p*K/(1+exp(-p*p*(t-p)) #variable K to modify later
model0(K) = (t,p) -> model(t,p,K) #this is to create a function with a fixed K
p0 = [0.5,0.5,0.5] #initial parameters
xdata = ..... #your data, in this case t
ydata = ..... # experimental I(t) data
fit = curve_fit(model0, xdata, ydata, p0)



Thanks, this is exactly what I needed except it dosn’t work with missing data.

# fit
@. richards_model(x, p) = 5500 / (1+exp(-p*p*(x-p)))^(1/p)
p0 = [2, 80, 0.1]
xdata = Vector(df.time)
ydata = Vector(df.cumcases)
fit = curve_fit(richards_model, xdata, ydata, p0)


because typeof(ydata) = Array{Union{Missing, Int64},1}

The error thrown is

TypeError: non-boolean (Missing) used in boolean context

1 Like

idx = findall(!ismissing,ydata) #find the indices of all non missing data in ydata
xdata2 = xdata[idx] #xdata but only in the places where ydata is not missing
ydata2 = ydata[idx] #all not missing values in ydata


Thanks, that did it. There is a function in the source code that checks the health of data. I may submit a PR to check for missing data as well.

function check_data_health(xdata, ydata)

if any(isinf, xdata) || any(isinf, ydata) || any(isnan, xdata) || any(isnan, ydata)

error("Data contains Inf or NaN values and a fit cannot be performed")

end

end

2 Likes

Please do that! I’ll be sure to review it. Ideally we should just support missing values by ignoring the relevant rows, but as a first step I’d be happy to merge an updated data health check.

1 Like

But let me also tell you the following: I know the LM is very popular for NLS, but in my own experience, BFGS directly on the ssr can be just as good an approach, if not better.

1 Like