Monotonic smoothing of noisy data

rafael.guerra · January 18, 2023, 11:02pm

Is there a package in Julia, or a simple algorithm, to perform monotonic smoothing of noisy data?

NB:
it is assumed that we do not know of any monotonic function or model that could fit the data.

rafael.guerra · January 19, 2023, 12:21am

A much more basic approach to this problem than the monotonic smoothing splines sought, and not as good, would be to use a monotonic moving average:

using Statistics

function monotonic_moving_average(A::AbstractArray, m::Int; direction=:Increasing)
    out = copy(A)
    Ifirst, Ilast = firstindex(A), lastindex(A)
    out[Ifirst] = mean(view(A, Ifirst:Ifirst+m÷2))
    for I in 2:Ilast
        x1 = mean(view(A, max(Ifirst, I-m÷2):min(Ilast, I+m÷2)))
        x0 = out[I - 1]
        out[I] = ifelse((x0 <= x1 && direction == :Decreasing) || (x0 >= x1 && direction == :Increasing), x0, x1)
    end
    return out
end

using Plots
x = -10:0.1:10
y = atan.(x) + 0.1*rand(length(x))
scatter(x, y, ms=2)
plot!(x, monotonic_moving_average(y, 11), lw=2)

scatter(x, -y, ms=2)
plot!(x, monotonic_moving_average(-y, 11; direction=:Decreasing), lw=2)

baggepinnen · January 19, 2023, 2:53am

There’s this package GitHub - ajtulloch/Isotonic.jl but it looks very old. Otherwise Convex.jl should allow you to implement isotonic regression very easily.

baggepinnen · January 19, 2023, 8:33am

Did you look at the link? It does not fit any model, the only thing isotonic regression does it to smooth the data such that it only increases or decreases.

baggepinnen · January 19, 2023, 8:39am

Here’s the trivial Convex.jl code

using Convex, SCS

N = 50
x = randn(N) .+ LinRange(0, 1, N)

y = Convex.Variable(N)
p = minimize(sumsquares(y - x))
p.constraints += y[2:end] >= y[1:end-1]
Convex.solve!(p, SCS.Optimizer)

plot([x y.value])

With convex optimization, you can easily tune the fit as well, here’s your example and with some smoothing realized by penalizing sumsquares(diff(y)))

t = -10:0.1:10
x = atan.(t) + 0.1*rand(length(t))

Base.diff(y::Convex.Variable) = y[2:end] - y[1:end-1]

y = Convex.Variable(length(t))
p = minimize(sumsquares(y - x) + 10sumsquares(diff(y)))
p.constraints += y[2:end] >= y[1:end-1]
Convex.solve!(p, SCS.Optimizer)

plot([x y.value])

rafael.guerra · January 19, 2023, 9:31am

Sorry Fredrik for overlooking the link provided (I was on my cell phone and stopped at the word “regression”).

I tested the different methods proposed and plotted them below.

The isotonic_regression!() doesn’t seem to smooth the data sufficiently.
The Convex method seems to be real caviar. Thank you.
I will mark it as a solution, in the absence of monotonic smoothing splines.

rafael.guerra · January 19, 2023, 8:47pm

@baggepinnen, just to point out that the convex solution is difficult to handle for a large number of points (e.g. 20,000), because it becomes slow and the choice of the regularization parameter is not obvious.

Dan · January 20, 2023, 4:11am

Here is a method which should be really quick, but less optimal than the convex optimization method. Perhaps this would be enough for the kind of noise in your application.

using Plots, BSplines

function iso(y)
    forwardmax = accumulate(max, y)
    backwardmin = reverse(accumulate(min, reverse(y)))
    meanmax = (forwardmax .+ backwardmin)/2
    return meanmax
end

N = 50
x = LinRange(-10,10,N)
y = atan.(x) + 0.25*rand(length(x))

q = iso(y)

basis = BSplineBasis(4,LinRange(extrema(x)...,N÷4))
spl = interpolate(basis, x, q)

plot(x,[y q spl.(x)]; label=["orig" "isotonic" "bspline"])

In the atan example, the result looks like this:

In terms of complexity, this is basically linear in length of data. Is it good enough for your needs?

baggepinnen · January 20, 2023, 5:28am

Yeah, you usually use custom optimization algorithms for this kind of problems, I just used the first one I could think of. It should however offer you a lot of flexibility, you can for example replace the smoothing penalty by a smoothing constraint on the max deviation over a number of samples etc.

rafael.guerra · January 20, 2023, 8:50am

Dan and Fredrik, thanks for the great comments. I’ll respond as soon as I have time to digest them.

rafael.guerra · January 21, 2023, 12:17pm

Hi Dan, after playing a bit with your nice B-splines minmax code, it seems that monotonicity is not enforced.

For example using 11 breakpoints on the noisy atan example:

x = -10:0.05:10
y = 5 * atan.(x) + 2 * rand(length(x))
basis11 = BSplineBasis(4, LinRange(extrema(x)..., 11))
spl11 = interpolate(basis11, x, q)

we get some decreasing intervals in the magenta dashed curve (but with 10 breakpoints it looks good):

So the main difficulty is in determining the number of knots to use (or where they should be placed, if we do not chose a linear range). But with some QC, the method is fast and may work nicely.

Dan · January 21, 2023, 12:44pm

The BSplines were just an automatic choice. Perhaps a different interpolation method is better suited (also depending on how smooth the output should be).
Something I found, which also has an implementation (in C) is:
https://en.wikipedia.org/wiki/Monotone_cubic_interpolation

Might be interesting to implement this.

The Rustaceans seem to have implemented it a week ago:
https://ruivieira.dev/monotonic-cubic-spline-interpolation-with-some-rust.html

JM_Beckers · January 21, 2023, 3:10pm

If the data you have are not on a regular grid, you might be interested in using DIVAnd. Originally it does not include inequality constraints but a quick hack provides one (and it will probably be included in a more general way later so that one can add linear inequality constraints):

using Plots
using DIVAnd
using LinearAlgebra
using SparseArrays

x = rand(-10:10,20)
y = atan.(x) + 0.1*randn(length(x))
scatter(x, y, ms=2)
xgrid=collect(-10:0.01:10)
plot!(xgrid, atan.(xgrid), lw=2)

mask = trues(size(xgrid))
mask[1]=false
mask[end]=false
pm = ones(size(xgrid)) / (xgrid[2]-xgrid[1]);
len=2.0
epsilon2=0.3
fi,s=DIVAndrun(mask,(pm,),(xgrid,),(x,),y,len,epsilon2)

# Normal smoothing interpolation, not ensuring monotonic increase
scatter(x, y, ms=2)
plot!(xgrid, fi, lw=2)



#Function to create a strong DIVAnd constraint when an inequality is not satisfied:

function Ineqtoeq(yo,H,xo)
    # Translates H x - yo > 0 into a quadratic weak constraint depending on a first guess of the statevector xo
    vv=ones(size(yo)[1]).*100000000.
    vv[H*xo.<=yo].= 0.0000001    
    R=Diagonal(vv)    
    return DIVAnd.DIVAnd_constrain(yo, R, H)
end
       

A = spzeros(s.sv.n, s.sv.n)

i=1     
S = sparse_stagger(size(mask), i, s.iscyclic[i])
m = (S * mask[:]) .== 1
A = A +  sparse_pack(mask) * S' * sparse_pack(m)' * s.Dx[i]

# Iterate
for llll=1:8
        yo=statevector_pack(s.sv, (0.0 .*fi,))
        fs=statevector_pack(s.sv, (fi,))
        posconstraint=Ineqtoeq(yo, A, fs)
        m2c=(posconstraint,)
        fi,s=DIVAndrun(mask,(pm,),(xgrid,),(x,),y,len,epsilon2;constraints=m2c)
      
end

scatter(x, y, ms=2)
plot!(xgrid, fi, lw=2)

Monotonic function sampled at random positions with noise

Classical smoothing interpolation

Smoothing interpolation with monotonicity constraint

dlakelan · January 21, 2023, 4:56pm

A very general purpose method is to set up a Bayesian model with a path integration approach to providing a prior. For example log probability increment is integrate(F(g’(x)),dx) where F goes to negative infinity when g’ is negative (ie. The derivative must be nonnegative everywhere)

Here’s a gist that discussed this kind of model in general terms Path Integral Approach For Functional Priors · GitHub

You could adapt it by creating a functional as suggested above.

rafael.guerra · January 21, 2023, 8:36pm

@dlakelan, thank you for your suggestion and example code, unfortunately such Bayesian implementation is beyond my current skills.

cgeoga · January 21, 2023, 8:37pm

Here is my lazy submission: just use JuMP+Ipopt, riffing on the suggestion of @Dan to use B-splines and just forcing the monotonicity as a constraint. This takes a second or two on my computer with 20k points and 200 knots. But I assume that JuMP is detecting the constraint jacobian sparsity and so it should presumably scale pretty well with more knots/basis functions. I’m probably also not actually plugging this into JuMP in the most slick and efficient way—I’m a pretty new convert.

using BSplines, JuMP, Ipopt

# Obviously not pretty, but you get the idea.
function give_basis_matrix(x, nknots)
  basis = BSplineBasis(4, range(extrema(x)..., nknots))
  M = reduce(hcat, map(basis) do basis_j
    [basis_j(t) for t in x]
  end)
  Matrix(M') # so you can slice along columns
end

function monotonic_bspline_regression(x, y, nknots=20)
  n = length(y)
  basis_matrix = give_basis_matrix(x, nknots)
  model = Model(Ipopt.Optimizer)
  @variable(model, coefs[i=1:nknots+2])
  @objective(model, Min, sum((y[j] - dot(coefs, basis_matrix[:,j]))^2 for j in 1:n))
  for j in 2:n
    @constraint(model, dot(coefs, basis_matrix[:,j]) >= dot(coefs, basis_matrix[:,j-1]))
  end
  optimize!(model)
  estimated_coefs = value.(coefs)
  basis_matrix'*estimated_coefs # estimated monotonic y
end

y_hat = monotonic_bspline_regression(x, y, 200) # for example

I bet you could use SCS as the solver and make a handful of better decisions. But even this is pretty fast.

Fun thread!

tcovert · January 21, 2023, 8:41pm

wanted to highlight the underlying idea in @Dan’s approach - “rearrangement” - is known to have some nice statistical properties. see here. estimating the function with whatever method you like best (splines, kernels, polynomials etc), and then re-arranging its graph so that it is monotone, as desired, is basically statistically ok.

lrnv · January 21, 2023, 9:44pm

Using P-splines (and not B-splines), you might enforce monotonicity (or other shapes or the random variable: see https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1348/000711005X84293?casa_token=f4edP7w5GIEAAAAA%3AZhgqwpV5Zf2DfOaHnNDvpu1OlZSdor1-GX5M85uKWpV7MHLfHQ57JHp9eZzSnzljty6ZyZfBD3u5Xf9W2g for the reference and more generally Google Scholar). There is code in R to do such things, but i could not find a P-spline package in Julia. Maybe something to implement some days

dlakelan · January 22, 2023, 12:25pm

Note that a function which is monotone when evaluated only at a few discrete points is not necessarily monotone in general, it could oscillate in between those discrete points.

Using some constraints on the second derivative could prevent such oscillations.

rafael.guerra · January 22, 2023, 2:26pm

If you refer to @Dan’s minmax iso() monotonic function, this function is really fantastic.

Simply applying a moving average to it, seems to get most of the job done.

Topic		Replies	Views
Smoothing interpolation for tachometer data New to Julia smoothing , splines	29	3642	March 3, 2021
Piecewise-constant fitting Optimization (Mathematical) jump , curve-fitting , convex-optimization , piecewise	21	2570	February 9, 2023
Monotone cubic spline interpolation on an irregular grid Specific Domains splines	2	507	April 27, 2023
Convexity-Preserving Rational Interpolation Numerics question , package , interpolations	11	824	April 12, 2022
Smoothing spline fit through a set of (x,y) points General Usage question , splines	7	1547	October 18, 2023

Monotonic smoothing of noisy data

Related topics