Hello,
I would like to use an iterative algorithm to estimate the parameters of a t-distribution \pi_X characterized by a mean \mu \in \mathbb{R}^n, a PSD scale matrix C \in \mathbb{R}^{Nx \times Nx}, and a degree of freedom \nu>2.
I have access to M i.i.d. samples \{ x^1, \ldots, x^M \} \sim \pi_X. I am usually in the low-data setting where M \leq Nx.
To generate the initial condition, I need to compute the Mahalanobis distance of the samples using the sample mean and covariance as the best estimates for the mean and scale matrix.
In the following example, Nx = 60 and M = 40, the Mahalanobis distance is almost identical for all the samples due to the poor conditioning of the sample scale matrix. This issue doesn’t appear in smaller dimensions.
I would greatly appreciate your help to regularize those estimates. Here is a MWE:
using LinearAlgebra
using Statistics
using PDMats
Nx = 60
νX = 5.0
μX = zeros(Nx)
CX = PDiagMat(ones(Nx))
πX = Distributions.GenericMvTDist(νX, μX, CX)
M = 40
X = rand(πX, M)
μXhat = mean(X; dims = 2)[:,1]
CXhat = cov(X') + 1e-4*I
# Fast implementation
LXhat = cholesky(Positive, CXhat).L
δXhat = sum.(abs2, eachcol(LXhat \ (X .- μXhat)));
# Slow implementation
δXtest = zeros(M)
for i=1:M
δXtest[i] = (X[:,i] - μXhat)'*inv(CXhat)*(X[:,i] - μXhat)
end
# We obtain the same results
δXhat'
1×40 adjoint(::Vector{Float64}) with eltype Float64: 38.025 38.025 38.025 38.025 38.025 … 38.025 38.025 38.025 38.025
cond(CXhat)
1.1151259113830333e9