Efficient calculation of Mahalanobis distance


I would like to compute the following Mahalanobis distance for M samples \{ x^i \} with x^i \in \mathbb{R}^n

d(x^i) = (x^i - \mu)^\top \Sigma^{-1} (x^i - \mu) where we don’t know \Sigma but we only know a “square-root” matrix A \in \mathbb{R}^{n \times M} such that \Sigma = A A^\top.

For now, I am doing the following operations:

  1. Assemble \Sigma from A: \Sigma = A A^\top
  2. Extract the lower Cholesky factor L of \Sigma: L L^\top = \Sigma
  3. Solve the M linear systems L z^i = (x^i - \mu)
  4. Compute the square of the norm of \{z^i \}

This algorithm is squaring the conditioning number when we form \Sigma. I am wondering how we can avoid to form \Sigma and make the code faster. Here is a MWE:

using LinearAlgebra
using Statistics

Nx = 40
Ne = 50
λ = 1e-10
X = randn(Nx, Ne)

μX = randn(Nx)

# I picked a simpler definition for AX than in the true problem
AX = (X .- μX)

LX = cholesky(Symmetric(CX + λ*I)).L

dmahal = zeros(Ne)
for i=1:Ne
    dmahal[i] = sum(abs2, LX \ AX[:,i])

I would greatly appreciate your help.

Is your goal to learn or do you just need an efficient way to calculate the Mahalanobis distance?

If the latter, there is the distances package:

If the former, you could look at the code for Mahalanobis in distances.jl and see how it is done there.

1 Like

Thank you for pointing this resource. I have looked at their code but I couldn’t find routines to compute x^\top Q^{-1} x, but only for x^\top Q x.

My implementation is inspired by Distributions.jl and PDMat.jl. I am wondering how to reduce the conditioning number and make the code faster by exploiting the fact that we know some square root A of \Sigma.

From your example, it seems that A \in \mathbb{R}^{n\times M} is a “wide” matrix, i.e. more columns than rows (M \ge n). In this case, maybe use the LQ factorization?

That is, lq(A) forms a factorization A=LQ, where L \mathbb{R}^{n\times n} and Q \in \mathbb{R}^{M\times n} has orthonormal rows (QQ^T = I), equivalent to the QR factorization of A^T. Given this, \Sigma = AA^T = LQQ^T L^T = LL^T, and \Sigma^{-1} = L^{-T} L^{-1}. Hence your distance is simply d(x^i) = \Vert L^{-1} (x^i - \mu) \Vert.

In Julia:

L = LowerTriangular(lq(A).L)
dmahal = norm(L \ (x - μ))

Or, using your example code, I get:

julia> norm.(eachcol(LowerTriangular(lq(AX).L) \ AX)).^2 ≈ dmahal

(Note that you can apply L \ to every column of a matrix at once, which can be much more efficient than solving column-by-column, especially when the matrix is large.)

In general, the standard approaches to avoiding A^T A or AA^T computations (which square condition numbers) often involve QR or LQ factorization (respectively), or the SVD.

PS. I think the Mahalanobis distance is technically just the norm(...), not the norm squared as in your code.


Thank you very much @stevengj for your detailed answer! This is exactly what I am looking for. You’re correct A is a wide matrix M \leq n.

I actually need the square of the Mahalanobis distance in my code, so we can use
sum.(abs2,eachcol(LowerTriangular(lq(AX).L) \ AX))