Modeling with a spline a signal peak's shape from multiple datasets that captured the same event

I have time series data in which the recorded values capture a signal (i.e., peak) that emerges from an elevated noise floor (i.e., baseline; can be modeled as a straight line). The shape of the peak in the various time series is identical. What distinguishes the time series are the parameters of the baselines (i.e., slope and intercept), the scale of each peak, and the detection times. However, in this specific use case, both the difference in detection times between the time series (δt) and the peaks scaling factors are a priori known. The example code below simulates two such time series (Figure column 1).

I would like to deduce the slope and the intercept of each time series’ baseline (Figure column 2) and approximate the shape of the peak with a spline (Figure column 3) using the data from all time series (in the example below, two) simultaneously.

I was able to approximate the signal in noisy data with a spline in the absence of a baseline. I was also able to fit baselines to flat noise floors (i.e., without a peak) while imposing certain constraints (e.g., the inferred baseline must not predict negative values of the noise floor within the time range). However, I am struggling to piece these components together and solve this optimization problem with the available packages in Julia. Any assistance in this regard would be greatly appreciated.

As indicated above, the optimization problem should ideally impose certain constraints, namely:

  1. Signals predicted by the deduced baselines cannot be negative within the analyzed time range.
  2. Signals of the predicted peaks cannot be negative in any of the time series.
  3. The weight of each sample point should be adjusted based on its associated error (optional).

using CairoMakie, Distributions, Random

################

baseline(intercept, slope, δt) = t -> intercept .+ slope * (t + δt)

signal(baseline, peak, δt, scale) = t -> baseline(t) + scale * peak(t + δt)

noisy!(signal, noise_factor) = t -> signal(t) + rand.(Normal.(0, σ_error(signal(t), noise_factor)))

σ_error(signal, noise_factor) = @. noise_factor * sqrt(signal) / sqrt(2/π)

################

ts = 5:5:100

noise_factor = 1.7

# Some signal distribution

peak(t) = 6 * 10^4 * pdf.(Normal(50, 5), t) # peak sigmal (to be inferred from the empirical data)

# Time series data 1

δt₁ = -0.5 # ion detection time shift relativ to scantime (known)

bl₁ = baseline(2000, -5.1, δt₁) # baseline signal (to be inferred from the empirical data)

scale₁ = 1 # peak signal scaling factor (known)

# Time series data 2

δt₂ = -4.2 # ion detection time shift relativ to scantime (known)

bl₂ = baseline(1000, 2, δt₂) # baseline signal (to be inferred from the empirical data)

scale₂ = 0.2 # peak signal scaling factor (known)

# Generate the signals of each ion by combining baseline signal and peak signal

signal₁ = signal(bl₁, peak, δt₁, scale₁) # ion 1

signal₂ = signal(bl₂, peak, δt₂, scale₂) # ion 2

fig = Figure(resolution = (800, 500))

# Ion 1

ax₁ = Axis(fig[1, 1])

ylims!(ax₁, 500, 7000)

lines!(ax₁, 1..100, signal₁, color=:red) # true signal

scatter!(ax₁, ts, noisy!(signal₁, noise_factor), color=:blue) # detected signal

# Ion 2

ax₂ = Axis(fig[2, 1])

ylims!(ax₂, 500, 2200)

lines!(ax₂, 1..100, signal₂, color=:red) # true signal

scatter!(ax₂, ts, noisy!(signal₂, noise_factor), color=:blue) # detected signal

# To be inferred baseline in time series data of ion 1

ax₃ = Axis(fig[1, 2])

ylims!(ax₃, 500, 7000)

lines!(ax₃, 5..100, bl₁)

# To be inferred baseline in time series data of ion 2

ax₄ = Axis(fig[2, 2])

ylims!(ax₄, 500, 2200)

lines!(ax₄, 5..100, bl₂)

# To be inferred empirical peak shape from signals in time series data of ion 1 and ion 2

ax₅ = Axis(fig[1:2, 3])

lines!(ax₅, 5..100, peak)

display(fig)

Hi @oniehuis, welcome to the forum.

This looks like a fitting problem.

Something like this should get you started:

using Distributions

peak(t) = 6 * 10^4 * pdf.(Normal(50, 5), t)
baseline(intercept, slope, δt) = t -> intercept .+ slope * (t + δt)
signal(baseline, peak, δt, scale, t) = t -> baseline(t) + scale * peak(t + δt)
σ_error(signal, noise_factor) = @. noise_factor * sqrt(signal) / sqrt(2/π)
noisy!(signal, noise_factor) = t -> signal(t) + rand.(Normal.(0, σ_error(signal(t), noise_factor)))

ts = 5:5:100
noise_factor = 1.7
δt₁ = -0.5
bl₁ = baseline(2000, -5.1, δt₁)
scale₁ = 1
signal₁ = signal(bl₁, peak, δt₁, scale₁)
data = noisy!(signal₁, noise_factor).(ts)

using JuMP, Ipopt
model = Model(Ipopt.Optimizer)
@variable(model, intercept, start = 0)
@variable(model, slope, start = 0)
@variable(model, -1 <= δt <= 1, start = 0)
@variable(model, prediction[1:length(ts)] >= 0)
@variable(model, residual[1:length(ts)])
@constraint(
    model, 
    [i in 1:length(ts)],
    prediction[i] == intercept + slope * (ts[i] + δt),
)
@constraint(model, residual .== prediction .- data)
@objective(model, Min, sum(residual.^2))
optimize!(model)
value(intercept), value(slope), value(δt)

Hi @odow, thanks for the feedback and sorry for the late reply. I followed your advice and started modeling the problem using JuMP. However, one problem I am still struggling with is how to incorporate the known signal strength dependent error into the least squares regression. The code below shows a simple example of how I wanted to approach this. The solution I came up with (out-commented line 40) causes the program to exit with a message that the problem has too few degrees of freedom. I am not sure what to make of this message. Any feedback would be appreciated! Best wishes, Oliver


function baseline(intercept, slope, t)
    x = intercept + slope * t
    x < 0 && (x = 0)
    x
end

σ_error(signal, noisefactor) = @. noisefactor * sqrt(signal) / sqrt(2/π)

function noisy!(data::Vector{<:Real}, noisefactor::Real)
    for i in eachindex(data)
        data[i] += rand.(Normal.(0, σ_error(data[i], noisefactor)))
        data[i] < 0 && (data[i] = 0)
    end
    data
end

ts = 1:1:100
noisefactor = 1.7
puresignal = baseline.(100, -1.2, ts)
noisysignal = noisy!(puresignal, noisefactor)

fig = Figure()
ax = Axis(fig[1, 1])
scatter!(ax, ts, noisysignal)

N = length(ts)
A = hcat(ones(Float64, N), ts)

m = Model(Ipopt.Optimizer)
@variable(m, x[1:2])
@variable(m, prediction[1:N] >= 0)
@constraint(m, prediction .== A * x)
@variable(m, residual[1:N])
@constraint(m, [i in 1:N], residual[i] == prediction[i] - noisysignal[i])

# I would like to normalize the residuals by a predicted signal-dependend factor?
# E.g.,
# @constraint(m, [i in 1:N], residual[i] == (prediction[i] - noisysignal[i]) / σ_error(prediction[i], noisefactor))
# EXIT: Problem has too few degrees of freedom.

@objective(m, Min, sum(residual.^2))
optimize!(m)

lines!(ax, ts, value(x[1]) .+ value(x[2]) * ts)
display(fig)```

I can’t reproduce your too few degrees of freedom error (did you add both residual[i] == constraints?

But your normalization has a problem: it can result in 1 / sqrt(0) if prediction[i] is zero. This is undefined and it will lead to numerical problems.

You should consider other approaches to normalization.

Hi @odow, thanks for your feedback and advice. You are right: I added both constraints by mistake. Thanks also for pointing out the 1/sqrt(0) problem. In my specific use case, I can get around this problem by being a bit more restrictive about what values prediction[1:N] can take: I specify prediction[1:N] >= eps() instead of prediction[1:N] >= 0. Is there a deeper reason why I can’t specify prediction[1:N] > 0? Best wishes, Oliver

The surface answer is that strict inequalities are ruled out in mathematical optimization models by convention and therefore language implementations. The theoretical reason relates to the fact that when optimising over a closed set the optimal point is a member of that set, which is not the case for an open set (think asymptomatic behaviour).

1 Like

You should probably use something like prediction[1:N] >= 1e-4.

Solvers like Ipopt use a feasibility tolerance (usually 1e-6 or 1e-8), so that a constraint like prediction[1:N] >= 1e-4 is really prediction[1:N] >= 1e-4 - 1e6.

Hi @jd-foster, hi @odow, thank you for your explanations and advice. Your feedback helped me a lot to get closer to the code that solves the question in my post. I really started to appreciate the functionality of JuMP. Best wishes, Oliver

2 Likes