Estimating parameters of a function for curve fitting

I have some data that has missing values. I am trying to fit a curve on this data, using Richard’s Model which is

\begin{aligned} I(t)=& \frac{K}{\left[1+e^{-r a(t-\tau)}\right] \frac{1}{a}} \end{aligned}

where K is fixed and I am trying to estimate a, \tau, and r. So for any time t, I want to be able to compute I(t) and it should be close to my data. I do have missing points in my time series.

I know how to do this in R, with the following code (here \tau is replaced with b.

######
richards.model <- function(prm, K, x) {
  a = prm[['a']]
  b = prm[['b']]
  r = prm[['r']]
  denom <- (1+exp(-r*a*(x-b)))^(1/a)
  res <- K / denom
  return(res)
}

errfct <- function(prm,K,dat) {
  xx <- dat$t
  yy <- richards.model(prm , K=K, x=xx)
  return( sum( (yy-dat$cum.cases)^2, na.rm = T))
}

## compute on all data points
K <- 5500
pfit <- optim(par = list(a=2,b=80,r=0.1), 
              fn = errfct, dat = dat, K=K)

So the optim function calls the errfct with an estimate of parameters. The errfct function simply calculates I(t) with the given parameters, and calculates the sum of square errors. I am assuming the optim function is trying to minimize the least squares error.

Is there an equivalent package in Julia?

You might try Optim.jl.

1 Like

there is LsqFit.jl for just fitting of non linear functions to data.

using Pkg #package manager
Pkg.add("LsqFit") #adding LsqFit to your Packages
using LsqFit #including LsqFit in your enviroment
#p = [a,tau,r] #vector, this is the order of parameters im gonna use
@. model(t,p,K) = p[1]*K/(1+exp(-p[1]*p[3]*(t-p[2])) #variable K to modify later
model0(K) = (t,p) -> model(t,p,K) #this is to create a function with a fixed K
p0 = [0.5,0.5,0.5] #initial parameters
xdata = ..... #your data, in this case t
ydata = ..... # experimental I(t) data
fit = curve_fit(model0, xdata, ydata, p0)

Thanks, this is exactly what I needed except it dosn’t work with missing data.

# fit 
@. richards_model(x, p) = 5500 / (1+exp(-p[3]*p[1]*(x-p[2])))^(1/p[1])
p0 = [2, 80, 0.1]
xdata = Vector(df.time)
ydata = Vector(df.cumcases)
fit = curve_fit(richards_model, xdata, ydata, p0)

because typeof(ydata) = Array{Union{Missing, Int64},1}

The error thrown is

TypeError: non-boolean (Missing) used in boolean context
1 Like

add this:

idx = findall(!ismissing,ydata) #find the indices of all non missing data in ydata
xdata2 = xdata[idx] #xdata but only in the places where ydata is not missing
ydata2 = ydata[idx] #all not missing values in ydata

Thanks, that did it. There is a function in the source code that checks the health of data. I may submit a PR to check for missing data as well.

function check_data_health(xdata, ydata)

    if any(isinf, xdata) || any(isinf, ydata) || any(isnan, xdata) || any(isnan, ydata)

        error("Data contains `Inf` or `NaN` values and a fit cannot be performed")

    end

end
2 Likes

Please do that! I’ll be sure to review it. Ideally we should just support missing values by ignoring the relevant rows, but as a first step I’d be happy to merge an updated data health check.

1 Like

But let me also tell you the following: I know the LM is very popular for NLS, but in my own experience, BFGS directly on the ssr can be just as good an approach, if not better.

1 Like