# Estimating parameters of a function for curve fitting

I have some data that has missing values. I am trying to fit a curve on this data, using Richard’s Model which is

\begin{aligned} I(t)=& \frac{K}{\left[1+e^{-r a(t-\tau)}\right] \frac{1}{a}} \end{aligned}

where K is fixed and I am trying to estimate a, \tau, and r. So for any time t, I want to be able to compute I(t) and it should be close to my data. I do have missing points in my time series.

I know how to do this in R, with the following code (here \tau is replaced with b.

######
richards.model <- function(prm, K, x) {
a = prm[['a']]
b = prm[['b']]
r = prm[['r']]
denom <- (1+exp(-r*a*(x-b)))^(1/a)
res <- K / denom
return(res)
}

errfct <- function(prm,K,dat) {
xx <- dat$t yy <- richards.model(prm , K=K, x=xx) return( sum( (yy-dat$cum.cases)^2, na.rm = T))
}

## compute on all data points
K <- 5500
pfit <- optim(par = list(a=2,b=80,r=0.1),
fn = errfct, dat = dat, K=K)


So the optim function calls the errfct with an estimate of parameters. The errfct function simply calculates I(t) with the given parameters, and calculates the sum of square errors. I am assuming the optim function is trying to minimize the least squares error.

Is there an equivalent package in Julia?

You might try Optim.jl.

1 Like

there is LsqFit.jl for just fitting of non linear functions to data.

using Pkg #package manager
using LsqFit #including LsqFit in your enviroment
#p = [a,tau,r] #vector, this is the order of parameters im gonna use
@. model(t,p,K) = p*K/(1+exp(-p*p*(t-p)) #variable K to modify later
model0(K) = (t,p) -> model(t,p,K) #this is to create a function with a fixed K
p0 = [0.5,0.5,0.5] #initial parameters
xdata = ..... #your data, in this case t
ydata = ..... # experimental I(t) data
fit = curve_fit(model0, xdata, ydata, p0)


1 Like

Thanks, this is exactly what I needed except it dosn’t work with missing data.

# fit
@. richards_model(x, p) = 5500 / (1+exp(-p*p*(x-p)))^(1/p)
p0 = [2, 80, 0.1]
xdata = Vector(df.time)
ydata = Vector(df.cumcases)
fit = curve_fit(richards_model, xdata, ydata, p0)


because typeof(ydata) = Array{Union{Missing, Int64},1}

The error thrown is

TypeError: non-boolean (Missing) used in boolean context

1 Like

idx = findall(!ismissing,ydata) #find the indices of all non missing data in ydata
xdata2 = xdata[idx] #xdata but only in the places where ydata is not missing
ydata2 = ydata[idx] #all not missing values in ydata


Thanks, that did it. There is a function in the source code that checks the health of data. I may submit a PR to check for missing data as well.

function check_data_health(xdata, ydata)

if any(isinf, xdata) || any(isinf, ydata) || any(isnan, xdata) || any(isnan, ydata)

error("Data contains Inf or NaN values and a fit cannot be performed")

end

end

2 Likes

Please do that! I’ll be sure to review it. Ideally we should just support missing values by ignoring the relevant rows, but as a first step I’d be happy to merge an updated data health check.

1 Like

But let me also tell you the following: I know the LM is very popular for NLS, but in my own experience, BFGS directly on the ssr can be just as good an approach, if not better.

1 Like

I am looking for similar curve fitting (minimization of SSE) Julia codes. Do you manage to get this working? Thanks.

What’s wrong with the examples here?

1 Like

This example is perfectly fine. It also showed method to tackle missing data. What I’m missing is the original question which is to translate the R code into Julia equivalent (e.g. using Optim.jl, GLM.jl or JuMP.jl). Pardon me, I am Julia novice.

Much of my experiences is from Excel solver. I also use similar approach (min SSE) in R. I did use curve_fitting_solver_function, e.g. curve_fit() function to solve equations - did that in Python using SciPy. What I do like about minimization of SSE is that I can modify the minimization function, e.g. to MLE or have flow control (if-else).

For those interested, below is an example of SSE minimization using solver in Julia. It can be easily modified for the posted question.

using Optim, Gadfly, Cairo

# Julia ver. 1.4, Gadfly ver. 1.2.1, Cairo ver. 1.0.2

# y = f(x1, x2) = β_1*x1 + β_2*x2 + β_3 # simply a function
# β_1, β_2 & β_3 are parameters to be solved by the Optim solver
# x1 and x2 are the variables - indepedent variables = inputs
# y is the dependent variable = output

# input data
x1 = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
x2 = [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
#y = 1.0 + 2.0 * x + [-0.3, 0.3, -0.1]
# output data
y = [3.0, 6.0, 11.0, 18.0, 27.0, 38.0, 51.0, 66.0, 83.0, 102.0]

# define function to fit
function fun_y(betas, x1, x2)
z = betas * x1^2 + betas * x2^2 + betas
return z
end

# define the sum squared errors
function sqerror(betas)
err = 0.0
for i in 1:length(x)
#pred_i = betas + betas * x[i]^2
pred_i = fun_y(betas, x1[i], x1[i])
err += (y[i] - pred_i)^2
end
return err
end

# run the solver
res = optimize(sqerror, [0.0, 0.0, 0.0], LBFGS())
println(res)
par = Optim.minimizer(res)

println(par)

# compute the fitted parameters and get the predicted result
pred_i = []
for i in 1:length(y)
new_i = fun_y(par, x1[i], x2[i]) #par + par * x[i] ^2
push!(pred_i, new_i)
end

# put into Dataframe for plotting
df = DataFrame()
df[!, :x1] = x1
df[!, :x2] = x2
df[!, :y] = y
df[!, :yy] = pred_i

# plotting
p1 = plot(df, x="x1", y="y",
Geom.point,
layer(df, x="x1", y="yy", Geom.line, Theme(default_color=colorant"red")),
#layer(df, x="x2", y="yy", Geom.line, Theme(default_color=colorant"red")),
Coord.Cartesian(ymin=0, ymax=110, xmin=0, xmax=12, yflip=false),
)

curr_jl_dir = @__DIR__
draw(PDF(curr_jl_dir * "\\y_x1.pdf", 5inch, 5inch), p1)

p2 = plot(df, x="x2", y="y",
Geom.point,
layer(df, x="x2", y="yy", Geom.line, Theme(default_color=colorant"red")),
Coord.Cartesian(ymin=0, ymax=110, xmin=0, xmax=12, yflip=false),
)

draw(PDF(curr_jl_dir * "\\y_x2.pdf", 5inch, 5inch), p2)

p3 = plot(df, x="y", y="yy",
Geom.point,
)

draw(PDF(curr_jl_dir * "\\y_yy.pdf", 5inch, 5inch), p3)

1 Like