# Sparse Spike Deconvolution: how to perform a cross validation using JuMP?

I’m basically trying to minimize the following matrix equation

Z(the data) and c are vectors while G and A are matrices. G is a matrix that receives a scalar parameter P. To create the model, I’ve set a fix P and tried to find the best c for a given P. The little g below “min” on the image can be ignored. The constraint is that all entries of vectors c must be positive.

My question is how do I perform a cross validation to find which P best fits the data?

(the data vector Z was artificially generated, I’ll leave the codes regarding how all these matrices were build separated. The main idea here is to compute the error between Z data and the algorithm that computes a value for Z, which is the product AGc. Hence why I’m setting a minimization for it. The context in here is trying to get a distribution of relaxation times from an impedance data. Which is pretty much solving for a decovolution problem.)

``````using JuMP, HiGHS

# Create an empty DRT model
DRT_model = Model(SCS.Optimizer)

@variable(DRT_model, c[1:num_points] >= 0 + 1e-3)

# Build the objective function
@objective(DRT_model, Min, sum(Zcat - A * G * c).^2)

optimize!(DRT_model)
``````

Below is how the matrices were computed. One more thing, Z entries are written as a + bi, Zcat on the otherhalf concatenated Z, the first rows corresponding to the real part and the last to the imaginary.

``````using Plots, LinearAlgebra, JuMP, HiGHS

struct Data
R1 :: Float64
R2 :: Float64
R3 :: Float64
C1 :: Float64
C2 :: Float64
C3 :: Float64
freq_start :: Float64
freq_end :: Float64
end

data = Data(
100.0,
300.0,
750.0,
0.01,
0.05,
0.5,
1e5,
1e-3,
20,
)

function generate_Z(data :: Data)

# Calculate the number of points

# Create logarithmically spaced frequency range
f = 10 .^ range(log10(data.freq_start), log10(data.freq_end), length = num_points)

# Define the proper elements for computing
ω = 2 * π * f
j = im
Z = ComplexF64[]

# Compute Z for each ω
for i in ω
Z1 = 1 / (1 / (data.R1) + j * i * data.C1)
Z2 = 1 / (1 / (data.R2) + j * i * data.C2)
Z3 = 1 / (1 / (data.R3) + j * i * data.C3)

Z_temp = Z1 + Z2 + Z3
push!(Z, Z_temp)
end

return num_points, f, ω, Z
end

# Call for the function
num_points, f, ω, Z = generate_Z(data)

function build_G(τ, num_points, P)
# Initiliaze a matrix with zeros
G = zeros(length(τ), num_points)

# Calculate each element of the matrix
for i in 1:length(τ)
for j in 1:num_points
G[i, j] = (1 / (2 * pi)) * sin(P * π) / (cosh(P * log(τ[i] / τ[j])) + cos(P * π))
end
end

# Normalize each column of the matrix
for i in 1:num_points
G[:, i] .= G[:, i] / sum(G[:, i])
end

return G
end

G = build_G(τ, num_points, 0.9)

function build_A(f, ω)
# Set the time constants τ limits
τ_max = 1 / maximum(f)
τ_min = 1 / minimum(f)

# Create a τ grid time domain
τ = 10 .^ range(log10(τ_min), log10(τ_max), length = 1000)

n = length(ω)  # Number of frequency points
m = length(τ)  # Number of relaxation times
A = zeros(2n, m)

# Compute trapezoidal coefficients 'a'
a = zeros(m)
a[1] = (τ[2] - τ[1]) / 2
a[end] = (τ[end] - τ[end - 1]) / 2
for j in 2:(m - 1)
a[j] = (τ[j + 1] - τ[j - 1]) / 2
end

# Fill in matrix A enties
for i in 1:n
ωiτ = ω[i] .* τ
for j in 1:m
# Adjust formulas for the real and imaginary parts by including 'a'
A[i, j] = a[j] * 1 / (1 +  ωiτ[j]^2)
A[n + i, j] =  a[j] * ωiτ[j]^2 / (1 + ωiτ[j]^2)
end
end

return A
end

A = build_A(f, ω)
``````

One additional detail, Z is the vector containing the impedance data, A is a discretization matrix based on both the kernel and discretization approach for deconvolute the data, G is a matrix that contains the chosen basis vector, a.k.a an impulse function, and c is the vector containing the latter weights.

Hi @Matheus, welcome to the forum.

My question is how do I perform a cross validation to find which P best fits the data?

Do you mean mathematically, or in code?

The simplest approach would be to solve the problem for a range of discretized `P` and pick the best result.

Hi @odow , thank you very much.

I was thinking of doing manually, since in thix context the parameter P changes from 0 to 1, but some papers that I was reading mentioned approaches like L-curve, Re-Im crossvalidation, and tbf I got lost.

Running the code, changing P from 0.1 up to 0.9 with a step of 0.1, I wasn’t able to find the correct DRT. I’m trying for find the best P just to be sure if there are any mistakes that passed by me when building the math model.

Cross-validation usually means that you do not use all of the data. Instead, you could divide the data into N subsets, then, find the `P` that has the best average objective value if you were to solve the N cases independently (so a different optimal `c` for each case).

This doesn’t seem specific to JuMP or Julia, so you should Google “k-fold cross validation.” The wikipedia page is a good place to start: Cross-validation (statistics) - Wikipedia.

JuMP doesn’t have any built-in support for this. You’ll need to manually code it yourself.

I see. I’ll have a look into that. Thank you very much.