Understanding Mahalanobis in Distances.jl

Jasper_Hall · April 21, 2021, 8:38am

Thanks for the reply. I was expecting a distance between each variable (column) and the envelope of the reference set. That would be 500 x 3 in this case rather than a distance for every single observation and each member of the reference set because that’s the point of the Mahalanobis distance.

I figured that the zeros on the diagonal was for the reason you say and that makes perfect sense but what I was hoping to understand was why the Python example gets a single value (that appears to come for the diagonal) for the distance of each observation to the reference set. I have seen other examples worked in Excel and elsewhere that also return a single value for Mahalanobis distance, so I presumed there was a method to ‘collapse’ all the distances to a single value (root mean square of a row of distances - is that legitimate?).

EDIT: I implemented the python example exactly as it is in the link and I get a 500 x 500 array out, from which they are taking the diagonal.

EDIT 2: It turns out that there is an alternative version of the Mahalanobis formula that which uses distances from each observation to the central mean, which appears to be what the Python version is doing from looking at the code. This explains the dramatically different results but begs the question how we can call both these things “Mahalanobis distance”?

Topic		Replies	Views
Distances and Similitude New to Julia question	14	673	December 7, 2021
Efficient calculation of Mahalanobis distance Performance question , linearalgebra	5	690	November 11, 2022
Robust Mahalanobis distance in high-dimensions with limited samples Statistics question	2	253	April 29, 2023
A question about Distances.jl General Usage	24	2247	March 27, 2024
Compute pairwise distances (Distances.jl) General Usage performance , distances , tullio , loopvectorization	9	3698	February 1, 2024

Understanding Mahalanobis in Distances.jl

Related topics