DBSCAN clustering with Haversine metric

Hi there,

I have an array of points (latitude, longitude) in radians and I am trying to use the package Clustering.jl to cluster my points. In the documentation it seems like a metric can be given, the default being: Euclidean() however I need to use the Haversine metric and I dont seem to find the way of using such metric. Any ideas?

Thanks a lot!

Perhaps you can try the clustering API in GeoStats.jl. I believe we have all Clustering.jl models over there too. The latlon will be taken care of for you internally. But it is been a while since the last time we tested DBSCAN

Looking at the documentation it seems that only DBScan takes a metric keyword argument:

DBSCAN · Clustering.jl.

You should be able to pass Haversine() to it instead of Euclidean().

Some other (not all) methods (e.g. hierarchical, k-mediods), take a distance matrix, instead of a data matrix, so you could compute all pairwise distances yourself with Haversine() first.

1 Like

I tried and passing Haversine() did not work (or other transformations of the same command). Yep, I could do the pairwise distance but as the documentation says, that method results in efficiency losses so I was wondering whether it could be avoided. After all, passing some haversine version of Euclidean() seems like it should be there.

1 Like

Alternative: project the geogs into a Cartesian system and use euclidean distances. And at the end convert back to geogs.