Efficient calculation of Mahalanobis distance

mleprovost · November 11, 2022, 2:16pm

Hello,

I would like to compute the following Mahalanobis distance for M samples \{ x^i \} with x^i \in \mathbb{R}^n

d(x^i) = (x^i - \mu)^\top \Sigma^{-1} (x^i - \mu) where we don’t know \Sigma but we only know a “square-root” matrix A \in \mathbb{R}^{n \times M} such that \Sigma = A A^\top.

For now, I am doing the following operations:

Assemble \Sigma from A: \Sigma = A A^\top
Extract the lower Cholesky factor L of \Sigma: L L^\top = \Sigma
Solve the M linear systems L z^i = (x^i - \mu)
Compute the square of the norm of \{z^i \}

This algorithm is squaring the conditioning number when we form \Sigma. I am wondering how we can avoid to form \Sigma and make the code faster. Here is a MWE:

using LinearAlgebra
using Statistics

Nx = 40
Ne = 50
λ = 1e-10
X = randn(Nx, Ne)

μX = randn(Nx)

# I picked a simpler definition for AX than in the true problem
AX = (X .- μX)
CX = AX*AX'

LX = cholesky(Symmetric(CX + λ*I)).L

dmahal = zeros(Ne)
for i=1:Ne
    dmahal[i] = sum(abs2, LX \ AX[:,i])
end

I would greatly appreciate your help.

blackeneth · November 11, 2022, 3:06pm

Is your goal to learn or do you just need an efficient way to calculate the Mahalanobis distance?

If the latter, there is the distances package:

If the former, you could look at the code for Mahalanobis in distances.jl and see how it is done there.

mleprovost · November 11, 2022, 3:24pm

Thank you for pointing this resource. I have looked at their code but I couldn’t find routines to compute x^\top Q^{-1} x, but only for x^\top Q x.

My implementation is inspired by Distributions.jl and PDMat.jl. I am wondering how to reduce the conditioning number and make the code faster by exploiting the fact that we know some square root A of \Sigma.

stevengj · November 11, 2022, 5:55pm

From your example, it seems that A \in \mathbb{R}^{n\times M} is a “wide” matrix, i.e. more columns than rows (M \ge n). In this case, maybe use the LQ factorization?

That is, lq(A) forms a factorization A=LQ, where L \mathbb{R}^{n\times n} and Q \in \mathbb{R}^{M\times n} has orthonormal rows (QQ^T = I), equivalent to the QR factorization of A^T. Given this, \Sigma = AA^T = LQQ^T L^T = LL^T, and \Sigma^{-1} = L^{-T} L^{-1}. Hence your distance is simply d(x^i) = \Vert L^{-1} (x^i - \mu) \Vert.

In Julia:

L = LowerTriangular(lq(A).L)
dmahal = norm(L \ (x - μ))

Or, using your example code, I get:

julia> norm.(eachcol(LowerTriangular(lq(AX).L) \ AX)).^2 ≈ dmahal
true

(Note that you can apply L \ to every column of a matrix at once, which can be much more efficient than solving column-by-column, especially when the matrix is large.)

In general, the standard approaches to avoiding A^T A or AA^T computations (which square condition numbers) often involve QR or LQ factorization (respectively), or the SVD.

PS. I think the Mahalanobis distance is technically just the norm(...), not the norm squared as in your code.

mleprovost · November 11, 2022, 7:10pm

Thank you very much @stevengj for your detailed answer! This is exactly what I am looking for. You’re correct A is a wide matrix M \leq n.

mleprovost · November 11, 2022, 7:18pm

I actually need the square of the Mahalanobis distance in my code, so we can use
sum.(abs2,eachcol(LowerTriangular(lq(AX).L) \ AX))

Topic		Replies	Views
Understanding Mahalanobis in Distances.jl General Usage distances	4	1229	April 26, 2021
Square root of the inverse of a matrix Numerics linearalgebra	17	5543	June 19, 2020
Sparse Cholesky of Gram Matrix (SuiteSparse?) Numerics question , linearalgebra	25	2602	April 11, 2020
Could this be faster using some decomposition? Performance linearalgebra	4	342	January 27, 2022
Why lu() factorization is not reducing the execution time? Performance question , linearalgebra	10	951	September 24, 2021

Efficient calculation of Mahalanobis distance

Related topics