TLDR: I have a parameter inference problem that involves inferring a matrix of a linear system but I am having trouble with either avoiding the trivial solution (matrix = all zeros) or getting usable results.
I have a system akin to
\vec{X}(t) = \vec{b} + diag(\vec{\lambda}(\delta))[\vec{X}(t-\delta) - \vec{b}] + M\vec{Z}(t,\delta)
Where \vec{X} is my data; \vec{b}, \vec{\lambda} and M are vectors and a square matrix of parameters; \vec{Z} is noise. Everything is of size m=12 (or m \times m). I’m trying to see if I can infer the parameters by minimizing \vec{Z} across different time step lengths simultaneously. Something like
min \sum_{i, t, \delta} Z_i(t, \delta)^2
So that for different time points t and step lengths \delta we just get to minimize
M^{-1}\{\vec{X}(t) - \vec{b} - diag(\vec{\lambda}(\delta))[\vec{X}(t-\delta) - \vec{b}]\}
I’m having trouble figuring out a way of getting usable results for this. I was not sure I wanted my matrix M to be positive semidefinite because I did not necessarily want the different coordinates X_i to have symmetrical relationships with the noise coordinates Z_j. Because trying to solve directly for a matrix A=M^{-1} yields the trivial solution of A_{ij} = 0, instead I tried solving the linear system inside the loss function. Although I did get a result, I am neither satisfied with how it looks or with the performance (a gradient calculation takes 15s). Here’s a paraphrased version of the code:
using ForwardDiff
using LinearAlgebra
using Optimization
function myloss_systemsolve(
parameters::AbstractVector{T},
sample_series::AbstractMatrix;
max_stepsize::Integer = 90,
) where {T<:Real}
nvars, nsamples = size(sample_series)
lamb_arg_vec = parameters[1:nvars]
b_vec = parameters[nvars+1:2*nvars]
system_matrix = reshape(parameters[2*nvars+1:end], (nvars, nvars))
loss = zero(T)
for stepsize = 1:max_stepsize
lamb_vec = exp.(-stepsize * lamb_arg_vec)
for t = (stepsize+1):nsamples
residuals =
system_matrix \ (
sample_series[:, t] - b_vec -
lamb_vec .* (sample_series[:, t-stepsize] - b_vec)
)
loss += sum(abs2, residuals)
end
end
return loss
end
function myfit_systemsolve(
sample_series::AbstractMatrix{<:Real};
maxiters::Integer = 10000,
max_stepsize::Integer = 90,
)
nvars, nsamples = size(sample_series)
localloss(x) = myloss_systemsolve(x, sample_series; max_stepsize = max_stepsize)
opt_params_a = 0.001 * randn(2 * nvars)
opt_params_a[1:nvars] = max.(1e-8, opt_params_a[1:nvars])
opt_params_b = vec(Matrix{Float64}(LinearAlgebra.I, (nvars, nvars)))
opt_params = vcat(opt_params_a, opt_params_b)
lb_lamb = zeros(nvars)
lb_b = fill(-Inf, nvars)
lb_sys = fill(-Inf, nvars^2)
lb = vcat(lb_lamb, lb_b, lb_sys)
ub_lamb = fill(Inf, nvars)
ub_b = fill(Inf, nvars)
ub_sys = fill(Inf, nvars^2)
ub = vcat(ub_lamb, ub_b, ub_sys)
opt_func = OptimizationFunction((x, p) -> localloss(x), Optimization.AutoForwardDiff())
opt_prob = OptimizationProblem{true}(opt_func, opt_params; lb = lb, ub = ub)
opt_solution = solve(
opt_prob,
Optimization.LBFGS(30),
maxiters = maxiters,
)
println(opt_solution.stats)
lamb_vec = opt_solution.u[1:nvars]
b_vec = opt_solution.u[nvars+1:2*nvars]
system_matrix = reshape(opt_solution.u[2*nvars+1:end], (nvars, nvars))
system_matrix .= system_matrix ./ maximum(system_matrix)
return lamb_vec, b_vec, system_matrix
end
You can see that at the end I am rescaling the entire matrix M because I am getting very big numbers in the entries of M (particularly in the diagonals).
I then tried to actually optimize A=M^{-1} directly and using a regularization of the log determinant to try and ensure invertibility so that the optimizer avoids the trivial solution:
function myloss_direct(
parameters::AbstractVector{T},
sample_series::AbstractMatrix;
max_stepsize::Integer = 90,
) where {T<:Real}
nvars, nsamples = size(sample_series)
lamb_arg_vec = parameters[1:nvars]
b_vec = parameters[nvars+1:2*nvars]
inv_system_matrix = reshape(parameters[2*nvars+1:end], (nvars, nvars))
loss = zero(T) - 0.001 * logdet(inv_system_matrix)
for stepsize = 1:max_stepsize
lamb_vec = exp.(-stepsize * lamb_arg_vec)
for t = (stepsize+1):nsamples
residuals =
inv_system_matrix * (
sample_series[:, t] - b_vec -
lamb_vec .* (sample_series[:, t-stepsize] - b_vec)
)
loss += sum(abs2, residuals)
end
end
return loss
end
function myfit_direct(
sample_series::AbstractMatrix{<:Real};
maxiters::Integer = 10000,
max_stepsize::Integer = 90,
)
nvars, nsamples = size(sample_series)
localloss(x) = myloss_direct(x, sample_series; max_stepsize = max_stepsize)
opt_params_a = 0.001 * randn(2 * nvars)
opt_params_a[1:nvars] = max.(1e-8, opt_params_a[1:nvars])
opt_params_b = vec(Matrix{Float64}(LinearAlgebra.I, (nvars, nvars)))
opt_params = vcat(opt_params_a, opt_params_b)
lb_lamb = zeros(nvars)
lb_b = fill(-Inf, nvars)
lb_sys = fill(-Inf, nvars^2)
lb = vcat(lb_lamb, lb_b, lb_sys)
ub_lamb = fill(Inf, nvars)
ub_b = fill(Inf, nvars)
ub_sys = fill(Inf, nvars^2)
ub = vcat(ub_lamb, ub_b, ub_sys)
opt_func = OptimizationFunction((x, p) -> localloss(x), Optimization.AutoForwardDiff())
opt_prob = OptimizationProblem{true}(opt_func, opt_params; lb = lb, ub = ub)
opt_solution = solve(
opt_prob,
Optimization.LBFGS(30),
maxiters = maxiters,
)
println(opt_solution.stats)
lamb_vec = opt_solution.u[1:nvars]
b_vec = opt_solution.u[nvars+1:2*nvars]
system_matrix = inv(reshape(opt_solution.u[2*nvars+1:end], (nvars, nvars)))
system_matrix .= system_matrix ./ maximum(system_matrix)
return lamb_vec, b_vec, system_matrix
end
The problem is that the optimizer runs into negative determinants during the line search:
ERROR: DomainError with -1.0:
log was called with a negative real argument but will only return a complex result if called with a complex argument. Try log(Complex(x)).
DomainError detected in the user `f` function. This occurs when the domain of a function is violated.
For example, `log(-1.0)` is undefined because `log` of a real number is defined to only output real
numbers, but `log` of a negative number is complex valued and therefore Julia throws a DomainError
by default. Cases to be aware of include:
* `log(x)`, `sqrt(x)`, `cbrt(x)`, etc. where `x<0`
* `x^y` for `x<0` floating point `y` (example: `(-1.0)^(1/2) == im`)
Finally, I am now pondering if I should just give up on allowing M to be non-symmetric and try to use something like the system solving version with something like the cholesky decomposition inside it. I’m not really sure how to achieve it yet, I guess I need a function to transform the vector of parameters into a triangular matrix inside the loss function.
Any thoughts would be appreciated here.