Non-linear analysis

Having a bit of an issue and I don’t quite know where to turn.

I have insect emergence data that fits a Gompertz model quite well.

Example Data
Row │ ED Wasp
│ Float64 Float64
─────┼──────────────────
1 │ 14.375 2.0
2 │ 14.875 1.0
3 │ 15.375 10.0
4 │ 15.5417 11.0
5 │ 15.7083 1.0
6 │ 15.875 9.0
7 │ 16.5417 351.0
8 │ 16.7083 4.0
9 │ 16.875 153.0
10 │ 17.375 36.0
11 │ 17.5417 110.0
12 │ 17.7083 4.0
13 │ 18.375 55.0
14 │ 18.5417 105.0
15 │ 18.7083 1.0
16 │ 19.375 8.0
17 │ 19.5417 41.0
18 │ 19.7083 3.0
19 │ 20.5417 8.0
20 │ 21.5417 5.0
21 │ 21.7083 1.0
22 │ 21.875 1.0
23 │ 22.375 3.0
24 │ 22.5417 5.0
25 │ 24.5417 2.0

I am currently using RCall to determine the coefficients of the Gompertz Model based on the data. The equation R uses is below

y~a*exp(-b*exp(-c*x)
where
a is the asymptote
b is displacement along the x axis
c is the growth rate

all is well and good to this point.

the kink comes from a paper i found that uses the SpSS version of the equation

`y~aexp(-exp(-c(x-m)))’
where
a is the asymptote
b is an implied 1
c is slope of the growing part of the curve
m is the inflection point or where the growth goes exponetial.

What I would like to do is use the SPSS version without having to purchase and learn how to use SPSS to determine the fit of a line with the data.

I have no idea if any of this makes sense but I am happy to clarify and answer any questions.

Thank you so much
Mike

There are various libraries for fitting non-linear functions. Maybe give LsqFit.jl a try?

The expressions used by R and SPSS are equivalent. You can easily derive the SPSS coefficients from the R coefficients.

Suppose a, b, and c are the coefficients found from your R call, and that a', c', and m' are the coefficients that would be returned by SPSS. It is simple to show that a' = a, c' = c, and m' = \frac{\text{ln} \, b}{c}.

How should your data be understood? Like:

D = [14.375 2.0
14.875 1.0
15.375 10.0
15.5417 11.0
15.7083 1.0
15.875 9.0
16.5417 351.0
16.7083 4.0
16.875 153.0
17.375 36.0
17.5417 110.0
17.7083 4.0
18.375 55.0
18.5417 105.0
18.7083 1.0
19.375 8.0
19.5417 41.0
19.7083 3.0
20.5417 8.0
21.5417 5.0
21.7083 1.0
21.875 1.0
22.375 3.0
22.5417 5.0
24.5417 2.0]

??

If I plot these data,

using Plots
scatter(D[:,1], D[:,2])

or

Does this make sense?

no because I am looking at the cumlative information

so I sent raw data and forgot to explain that part. Too many things going on at the same time.

15.7083 1.0
15.875 10.0
16.5417 361.0
16.0783 365.0
etc

Good gawd Peter, you have provided the answer that myself (not exactly a high bar here), a tenured professor of Mathematics, a Ph.D. student in Mathematics and a LOT of reading on the internet couldn’t provide. The conversions between the two equations. Thank you!

Edit: I noticed that I really kinda botched the the SPSS version. It should read
’ y~a+c*(exp(-exp(-b*(t-m)))’

where
a+c = asymptote
b= slope of the growing part of the curve
m= inflexion point

2 Likes

A “quick and dirty” use of LsqFit.jl gave this:

Of course, I don’t know what your data shows (wasp arrival?? after number of days??), so you would probably use different text for the plot + labels + abscissa.

Some code:

# Importing packages
using LsqFit, Plots

# Defining some parameters for plotting
LW1 = 2.5
LC1 = :darkorange
LC2 = :red
LC3 = :blue
LS1 = :solid
LS2 = :dash

# Data
data = [14.375 2.0
14.875 1.0
15.375 10.0
15.5417 11.0
15.7083 1.0
15.875 9.0
16.5417 351.0
16.7083 4.0
16.875 153.0
17.375 36.0
17.5417 110.0
17.7083 4.0
18.375 55.0
18.5417 105.0
18.7083 1.0
19.375 8.0
19.5417 41.0
19.7083 3.0
20.5417 8.0
21.5417 5.0
21.7083 1.0
21.875 1.0
22.375 3.0
22.5417 5.0
24.5417 2.0]

x = data[:,1]
y = cumsum(data[:,2])
fg = plot(x,y; seriestype=:steppre, lw=LW1, lc=LC1, label="cumulative wasp arrival")

# Gompertz model
function model_1(x,p)
    a,b,c = p
    return a*exp.(-b*exp.(-c*x))
end

p0_1 = [900.0,16,1]
fit_1 = curve_fit(model_1, x, y, p0_1)
param_1 = fit_1.param

cov_1 = estimate_covar(fit_1)

conf_int_1 = confidence_interval(fit_1,0.1)

y_hat_1(x) = model_1(x,param_1)

x_range = range(minimum(x),maximum(x),length=100)

plot!(fg, x_range,y_hat_1(x_range); lw=LW1, lc=LC2, label="Gompertz model")

# SpSS model
function model_2(x,p)
    a,b,c,m = p
    return a .+ c*exp.(-exp.(-b*(x .- m)))
end

p0_2 = [600.0, 1, 300, 16]

fit_2 = curve_fit(model_2, x, y, p0_2)
param_2 = fit_2.param

cov_2 = estimate_covar(fit_2)

conf_int_2 = confidence_interval(fit_2,0.1)

y_hat_2(x) = model_2(x,param_2)

plot!(fg,x_range,y_hat_2(x_range); lw=LW1, lc=LC3, ls=LS2,label="SpSS model")
plot!(xlabel="days", title="Cumulative wasp arrival", frame=:box, legend=:bottomright)
2 Likes

Interesting. So using
fit.param

would give me the best estimated parameters (a,b,c,m) for the curve that it calculated?

Of course this invalidates the relation I found earlier.

Yes.

figured it did, Sorry

in that order?

Not quite sure what you mean, but I assume that you mean fit.param with fit first and param last? OK – let me be a little bit more precise.

  1. Function curve_fit with arguments (i) model function (e.g., model_1), (ii) “x-data” (e.g., x), (iii) “y-data” (e.g., y), and (iv) initial parameter guess (e.g., p0_1) in that order, returns a data structure.
  2. By writing fit_1 = curve_fit(model_1, x, y, p0_1), I give the returned data structure the name fit_1.
  3. The returned data structure has a so-called field with name param. This field contains the fitted parameter. The syntax for getting the content of field param in the data structure (which I have named fit_1), is fit_1.param.

So – no matter what I choose to call the fitted data structure, the field name is always param – the field name was chosen by the developers of LsqFit and we have to follow their choice. So if I instead had called the returned data structure Gompertz_estimate by using statement

Gompertz_estimate = curve_fit(model_1,x,y,p0_1)

the field name containing the estimated parameter is still the same, and is extracted (this time) by statement:

Gompertz_estimate.param

right I understand this but when I look at the values in fit_2 I get:
`fit_1

-15.399481864169104
1.1706823975674077
933.2300788361196
16.52300592020053

I don’t which value is for which parameter. Is a=-15.399? or is that m?

Ah. The order is given by the order in models, e.g.:

# Gompertz model
function model_1(x,p)
    a,b,c = p
    return a*exp.(-b*exp.(-c*x))
end

and

# SpSS model
function model_2(x,p)
    a,b,c,m = p
    return a .+ c*exp.(-exp.(-b*(x .- m)))
end

So for model_2, a,b,c,m = p implies that

  • a = -15.399…
  • b = 1.170…
  • c = 933.23…
  • m = 16.523…

that’s what I thought but when I use those parameters back into the equation I don’t get even close to the right answer for any given x in the data I posted.

BTW this is wasps emerging from a gall data.

1 Like

What does your function look like? The one you put the parameters back into?

I used the code you posted and am using model 2.

OK… again, here are the curves:

The (dark) orange curve is the cumulative of the output data you provided. Clearly, the fit is not good around abscissa value 16 (as an example). Also, at ca. 17, there is an offset. For larger abscissa values than 20, the fit is decent, I’d say.

It is not possible to work magic: the fitted curves (= the models) will never be equal to reality (the orange curve). But they may give an indication.

It may be possible to get better fit for lower abscissa values by using a different loss function than what LsqFit uses (by default – don’t know if it is possible to change that). Or by using different models.

I’m not a domain expert, so I don’t know what models are used in your field.

If you want to try with other models, you can do so. Below, I’ve added the logistic function:

1 Like