Best way to take derivatives of unevenly spaced data (with interpolations? discrete derivatives?)

rafael.guerra · January 29, 2021, 7:07pm

@dlfivefifty, you recommend computing the norm (or root mean square errors) for f, df/dt and d2f/dt2? It might well happen that some methods work best for some type of functions and not for others. How do mathematicians bring order to such a wide range of possibilities?

dlfivefifty · January 29, 2021, 7:10pm

Yea. This would be a Sobolev norm.

dlfivefifty · January 29, 2021, 7:14pm

Generally speaking if you know the regularity (ie differentiability) of the function there’s precise results on the convergence rates in Sobolev norms of various methods. Apart from perhaps AAA. Though uneven grids add an interesting wrinkle.

The mathematical areas studying this is “Approximation Theory” and there’s J. Approximation Theory and Constructive Approximation which publish papers on the topic. Mike Powell also has a nice though old book.

Edit: and of course Trefethen’s Approximation Theory and Practice for Chebyshev based methods

macd · January 30, 2021, 12:46am

Just to add here that there is a mistake in my example above in that I used a function defined on [0,1] to fit to a ApproxFun Fun defined on [-1, 1] (oops). I fixed that mistake and also added a comparison the the “exact” integral (starting with a Fun that had adaptively decided how many points to sample). That is now in a IJulia notebook in the ApproxFunExamples repo. You can best see it using the nbviewer here Jupyter Notebook Viewer

JeffreySarnoff · January 30, 2021, 1:18am

When fitting curves piecewise with splines of degree d, expect the fit at the initial d xs and the final d xs to be unsatisfactory. The closer to the first x or the final x, the less likely it is that the y will be fit robustly. Additionally, larger d asks to use higher order polynomials, which may not be what you want. Usually, d is one of (2,3,4,5).

If there is a reasonable way to extrapolate a few values before the first of and after the last of the available data, adjoining them often improves the behavior of the fit for the first and final data points. Ideally, you would prepend d+1 and postpend d+1 extrapolands, fit the piecewise spline, and avoid using all [most] of the artificial information.

To extend the data by the least difficult approach, copy the first and the last points. It is better to use a three-point or four-point extrapolation, extending the x proportionately to its nearest relative distances.

CosmoProf · January 30, 2021, 2:16am

So, I realized that I was anchoring the initial and final limits of my integration (of my key function RCurv) to the randomly selected time points, which is why each time I ran the program the range of numerical integration was slightly different, so the result was slightly different.

If I keep the limits of integration fixed to be the same each time, then I always get a similar final number, and it is very close to the right theoretical number (for the example function I’m using here). The calculation (adapted from jipolanco’s example) works well as long as the Spline Order is at least 4.

As suggested by JeffreySarnoff, I suppose it’s safe to go up to d=5 to be sure, and maybe cut off the regions of the interpolation containing my first and last 6 data points, as being “unreliable” parts of the spline fits.

I think I have a working solution here guys, thanks for your suggestions everyone!

rafael.guerra · January 30, 2021, 12:22pm

The function RCurv(t) has a division by 0 around t0 = 1.20608 and it complicates matters to integrate it starting at t = 1.
In the revised code above, the analyses started at t = 1.21. Used splines of degree 4 (thanks @jipolanco for the correction) for both BSplineKit and Dierckx, and ApproxFun.jl for the integration of RCurv(t) as it seems to produce best results near t0.

jipolanco · January 30, 2021, 12:50pm

That looks great!

Just a note for fairer comparison between spline packages. There are two different conventions for the definition of the spline order. The one used in BSplineKit is the same as in classical textbooks (e.g. de Boor) and is also the one used in Wikipedia. Namely, splines of order k have a polynomial degree k - 1. In particular, cubic splines have order 4.

Dierckx seems to use the other (also reasonable) convention, in which the spline order is the same as the polynomial degree.

In short, order n in Dierckx corresponds to order n + 1 in BSplineKit.

CosmoProf · February 1, 2021, 12:12am

While I have you guys here, once I have a spline representation of a fit to a data set, what’s the easiest way to find the locations & values of any maxima & minima of the function?

Example partial code:

Sfn = spline(BSplineKit.interpolate(timeDatArr, FunctionDatArr, BSplineOrder(5)))
Sd1 = diff(Sfn, Derivative(1)
Sd2 = diff(Sd1, Derivative(1))

How to find locations of zeros of Sd1, and the values of Sfn there?

Thanks!

jipolanco · February 1, 2021, 8:10am

You could use Roots.jl to find zeros of the first derivative of the spline.

Here is a quick example using Newton’s method:

using BSplineKit
using Roots

f(x) = cos(4π * x) * exp(-4x)

xs = 0:0.05:1
S = spline(interpolate(xs, f.(xs), BSplineOrder(6)))

S′ = diff(S, Derivative(1))
S″ = diff(S, Derivative(2))

x₀ = 0.5  # initial guess for Newton method
x = find_zero((S′, S″), x₀, Roots.Newton())  # 0.47547599925831957
S(x)   # 0.1422509042378483
S′(x)  # -1.5265566588595902e-16

rafael.guerra · February 1, 2021, 11:22pm

IntervalRootFinding.jl would be ideal to output all the extrema:

using BSplineKit, IntervalArithmetic, IntervalRootFinding, Plots

f(x) = cos(4*pi*x) * exp(-x)
f′(x) = -exp(-x) * (4*pi*sin(4*pi*x) + cos(4*pi*x))

a, b = -1, 2.5
x = a : 0.02 : b
S = spline(interpolate(x, f.(x), BSplineOrder(6)))
S′ = diff(S, BSplineKit.Derivative(1))

plot(x, f.(x), label="f(x)", lw=2)
plot!(x, S.(x), label="S(x)", ls=:dash)
rint = roots(f′, a..b)      # intervals with roots solutions
rts = mid.(interval.(rint)) # roots taken at each interval midpoint
scatter!(rts, f.(rts), label = "IntervalRootFinding for f′(x)", mc=:red,ms=5)
scatter!(rts, S.(rts), label="BSplineKit S(x) where f′(x)=0", mc=:cyan, ms=3)

IntervalRootFinding

However, it does not seem straightforward to have it consuming the BSplineKit spline functions:

S1d(x) = S′(x)
rint = roots(S1d, a..b)   # intervals with roots
ERROR: TypeError: non-boolean (Interval{Float64}) used in boolean context

dpsanders · February 1, 2021, 11:37pm

Yes, unfortunately interval methods cannot handle functions containing ifs.
You would have to extract and pass each separate piece of the spline to it.

If you use cubic splines then the extrema are given by quadratic equations, so it should be easy (?) to solve (again, on each segment separately).

ApproxFun can also easily do these extrema calculations.

rafael.guerra · February 2, 2021, 7:37pm

The problem of finding all extrema in the analyses interval can be solved with Dierckx.jl for cubic splines.

using Dierckx, Plots

f(x) = cos(4*pi*x) * exp(-x)
f′(x) = -exp(-x) * (4*pi*sin(4*pi*x) + cos(4*pi*x))

a, b = -1, 2.5
x = a : 0.02 : b
spl = Spline1D(x, f.(x), k=4)  # spline of degree 4
df1 = derivative(spl, x)  # 1st derivative
spl1 = Spline1D(x, df1, k=3)   # derivative of deg=3 of spline of deg=4
rts = roots(spl1, maxn=20)

plot(x, f.(x), label="f(x)", lw=3, lc=:cyan)
plot!(x, spl(x), label="Dierckx S(x) of 4th degree", ls=:dash,lc=:blue)
scatter!(rts, f.(rts), label = "f(x) for x in Dierckx S′(x)=0", mc=:red,ms=5)
scatter!(rts, spl(rts), label="Dierckx S(x) where S′(x)=0", mc=:yellow, ms=3)

Dierck_extrema

CosmoProf · February 4, 2021, 2:53am

Hi guys, taking a combination of suggestions from above, I was able to successfully find the extrema (minima/maxima) of my data – which is noisier & rougher than any smooth function – through a repeated combination:

Smoothing the data by taking a rolling average with rollmean (using RollingFunctions); Splining the smoothed data with spline(BSplineKit.interpolate and differentiating the Spline with diff (using BSplineKit); taking another rolling average to smooth the derivative; Splining the smoothed derivative, and finding its zeros (locating the extrema) with find_zeros (using Roots).

A complicated process, but it seems to work; thanks for your suggestions everybody!

rafael.guerra · February 4, 2021, 7:48am

According to the Numerical Recipes, when we are fitting data to a parametric model, it is almost always better to use raw data than to use data that have been pre-processed by a smoothing procedure. It would be best to fit a physics derived model to the experimental data, if applicable.

CosmoProf · February 15, 2021, 11:55pm

Hi Raphael, I would like to know more about that proscription, why & when it’s applicable.

But the original “data” I have is already very pre-processed and choppy (taken from an irregularly-sampled, step-wise discrete process), so doing anything to it without smoothing first seems like a lost cause.

Thanks for any further info.

stevengj · February 16, 2021, 12:00am

You can do a fit without smoothing first — the fit itself is a smoothing procedure.

Jake · April 30, 2021, 12:44pm

This post is excellent for ideas to try a number of smoothing spline approaches.

stevengj · August 17, 2022, 3:16pm

A post was split to a new topic: Generating finite-difference stencils

Topic		Replies	Views
High-order derivatives of sampled data General Usage finitediff , splines	9	2308	September 1, 2021
Smoothed Spline Interpolation compatible with ForwardDiff? Numerics question , forwarddiff	6	754	October 23, 2023
Arc length for a spline General Usage question , quadgk , splines	27	1760	February 7, 2025
Best method for calculating the second derivative of 1D array Performance question , package , performance , differentiation , splines	11	2578	October 20, 2021
Evaluate integral on many points (Cubature.jl ?) Numerics	44	4692	July 26, 2017

Best way to take derivatives of unevenly spaced data (with interpolations? discrete derivatives?)

Related topics