Adding logarithmic or polynomial regression to series

Good Day:

I have been attempting to fit a logarithmic or
polynomial trendline to the following series:

I am currently using GLM.jl and plotting with StatsPlots.jl, but do
not believe there are parameters I can adjust to plot!(kwarg…) to
the existing series.

I will need to adjust the y and x scale to :log10? Do I need to
define p0? Any suggestions?

Thank you,

Please post code :grinning:

I don’t understand your question - are you asking how to hit a polynomial regression in GLM or have you already done that and are struggling to plot a linear and polynomial model on the same graph?

@nilshg Thanks for your questions, Nils.

I am struggling to overlay a polynomial | logarithmic
regression over the series. The linear model here
does not quite fit the data and was simply the built
in param within the scatter( … smooth=:true) method.

@ptoche – The code I am using to generate the graph
was:

using StatsPlots, PlotlyJS

stp.scatter(DF[!, :Sales], DF[!, :Δ], 
	        label="Intersection", xlab="Sales",
            ylab="Cases", 
	        xlim=(0.6, 1.01),
	        smooth=:true,
	        lw = 2,
		framestyle=:axes)

Not sure what the problem is, but just in case it may help:

using GLM, DataFrames, StatsPlots; gr(dpi=600)

# automatically digitized with Chrome's WebPlotDigitizer app:
m = [ 0.62  2.25e-9;
      0.64  2.43e-9;
      0.66  2.65e-9;
      0.74  2.87e-9;
      0.77  2.08e-9;
      0.78  3.08e-9;
      0.81  3.25e-9;
      0.88  3.38e-9;
      0.96  3.45e-9;
      0.99  3.49e-9
]

DF = DataFrame(Sales = m[:,1], Δ = m[:,2])

Plots.scatter(DF[!, :Sales], DF[!, :Δ], 
	label="Input data", xlab="Sales",
    ylab="Cases", 
	xlim=(0.6, 1.01),
	smooth=:true,
	lw = 2, lc = :red,
    framestyle=:axes,
    legend=:outertop
)

pfit = lm(@formula(Δ ~ 1 + Sales), DF)
a, b = round.(coef(pfit), sigdigits=3)

plot!(DF.Sales, predict(pfit), lc=:cyan, lw=3, ls=:dot, label="Cases Δ = $a + $b * Sales")

1 Like

@rafael.guerra Thank you for this Rafael – it
inadvertently answered a question that has
been lingering on my mind.

I think the reason why the curve/regression is
straight is because of the scale. I wanted to
produce a logistic or polynomial curve to better
fit the series.

@YummyPampers2, you have a significant outlier in your data set, which raises the question of the accuracy of the other points. If their error bars are large and there is no a priori model for the data, could the straight line be an honest fit? If you have the data uncertainties, a weighted regression could be performed.

1 Like

Wow! Nice!

@YummyPampers2 , your dataset is very small: I would not recommend non-linear regression for such a small sample, unless you have very good reasons. You may want to check the docs for something along the lines of logit = glm(@formula(Δ ~ 1 + Sales), DF, Binomial(), LogitLink()), but do read up on this carefully beforehand to make sure you want to do that, e.g. Stock and Watson pages \sim 328.

3 Likes

You may want to play with the excelent Polynomials.jl package.

By removing the outlier you may get a decent data fit, but polynomials can be in general dangerous for data extrapolation, specially the high-order ones:

using Polynomials
(; Sales, Δ) = DF;      # Julia 1.7 destructuring
deleteat!(Sales, 5)     # this and next line will delete DF row!
deleteat!(Δ, 5)
p = Polynomials.fit(Sales, Δ, 2)
scatter(Sales, Δ, label="Input without oultier", legend=:topleft)
plot!(Sales, p.(Sales), lc=:green, lw=2, ls=:dash, label="2nd order Polynomial")

3 Likes

@rafael.guerra – thank you for this. Would you recommend
using OutlierDetection.jl or something similar (i.e.,
LinRegOutliers.jl) to remove the outlier.

When I apply your instructions, I am generating

I think the original outlier is at index 10.

Testing now, any suggestions short of
calculating Cook’s Distance and performing
some quartile exclusion?

Thanks,

1 Like

Take a look at the RAFF.jl package. which looks interesting for this problem, as it allows to fit a nonlinear model in a robust way in the presence of outliers.

2 Likes

@rafael.guerra Excellent!

Thank you for sharing the
knowledge – will test out
some use cases and
report.

Best regards,