Robust Mahalanobis distance in high-dimensions with limited samples

mleprovost · April 29, 2023, 3:09pm

Hello,

I would like to use an iterative algorithm to estimate the parameters of a t-distribution \pi_X characterized by a mean \mu \in \mathbb{R}^n, a PSD scale matrix C \in \mathbb{R}^{Nx \times Nx}, and a degree of freedom \nu>2.

I have access to M i.i.d. samples \{ x^1, \ldots, x^M \} \sim \pi_X. I am usually in the low-data setting where M \leq Nx.

To generate the initial condition, I need to compute the Mahalanobis distance of the samples using the sample mean and covariance as the best estimates for the mean and scale matrix.

In the following example, Nx = 60 and M = 40, the Mahalanobis distance is almost identical for all the samples due to the poor conditioning of the sample scale matrix. This issue doesn’t appear in smaller dimensions.

I would greatly appreciate your help to regularize those estimates. Here is a MWE:

using LinearAlgebra
using Statistics
using PDMats

Nx = 60
νX = 5.0
μX = zeros(Nx)
CX = PDiagMat(ones(Nx))
πX = Distributions.GenericMvTDist(νX, μX, CX)

M = 40
X = rand(πX, M)

μXhat = mean(X; dims = 2)[:,1]
CXhat = cov(X') + 1e-4*I

# Fast implementation
LXhat = cholesky(Positive, CXhat).L
δXhat = sum.(abs2, eachcol(LXhat \ (X .- μXhat)));

# Slow implementation
δXtest = zeros(M)
for i=1:M
    δXtest[i] = (X[:,i] - μXhat)'*inv(CXhat)*(X[:,i] - μXhat)
end

# We obtain the same results

δXhat'

1×40 adjoint(::Vector{Float64}) with eltype Float64: 38.025 38.025 38.025 38.025 38.025 … 38.025 38.025 38.025 38.025

cond(CXhat)

1.1151259113830333e9

cgeoga · April 29, 2023, 3:21pm

Maybe some adaptation of the ideas of the graphical lasso would be useful for this?

Dan · April 29, 2023, 9:15pm

Trying to understand the problem better: There are M samples (of dimension Nx), and the parameters are mu (of dimension Nx) and Cx (of dimension Nx * Nx).
It seems there are more parameters than samples and there is possible overfitting. Specifically, the sample mean and covariance overfit the distribution, giving (as calculated in the post), the same distance for each sample point (which would translate to the same likelihood).
Is this a fair description? If there is overfitting, more constraints (such as sparsity) or a regularizer (such as L1 in lasso) can be used to select one set of parameters over others.

Just to support this overfitting view, the resulting 38.025 distance for each sample, seems to be (M-1)^2/M analytically and independent of Nx (which is logical since the samples are in an M dimensional subspace).

Topic		Replies	Views
Efficient calculation of Mahalanobis distance Performance question , linearalgebra	5	690	November 11, 2022
Understanding Mahalanobis in Distances.jl General Usage distances	4	1221	April 26, 2021
Maximum likelihood for t-distribution by gradient descent Statistics question , package	24	1254	April 28, 2023
QMC vs rand Statistics distributions , random , quasimontecarlo	12	603	November 16, 2023
Package for sampling from a uniform distribution over a convex polytope? Modelling & Simulations statistics , optimization	3	1191	April 22, 2022

Robust Mahalanobis distance in high-dimensions with limited samples

Related topics