Simple use of Geodesy.jl to get euclidean distances?

You have three options:

  1. Cartesian Euclidian distance, which is an easy calculation but accuracy degrades with distance, but first you must convert lat/lon to meters, from the Prime Meridian (E +, W-) and the Equator (N +, S-). You don’t want to do that.
  2. Haversine distance, using the Haversine.jl package, which traces the distance along a spherical representation of the Earth, so it will trace a longer path than Cartesian and its relative accuracy to Cartesian increases with distance. But it works directly with lat/lon.
  3. Geodesic distance, using the SimpleFeatutes.jl package, which accounts for the Earth being an oblate spheroid, rather than a perfect sphere. It will allow you to specify the standard US NAD83 CRS (coordinate reference system) and has a function to calculate distances between two points.

The big advantage of SimpleFeatures.jl is that it is just a DataFrame object that accommodates a Geometry object to represent points, lines, polygons or multipolygons. That means you can have a single DataFrame with your GEOID, County Name, State Name if you want, lat/lon from the TIGER files, the Geometry object and whatever attributes you are collecting all in the same object.

Depending on what you’re planning to do, there are a couple of other considerations.

If for some reason you want pairwise data for all counties, even though many pairs will be zero valued, consider using a SparseArray. For choropleth mapping, I haven’t found a tool I really like. There’s an implementation of Plotly for roadmap type work and GeoStats.jl is expert-level. So, I usually resort to GGPlot2 in R through RCall or natively. I have’t tried TidierPlots.jl yet but it doesn’t have the sf_geom to work with SimpleFeatures, nor does Gadfly.jl or Makie.jl. (even with GeoMakie.jl).

2 Likes

So for right now I went with distances.jl and use the haversine distance. And I’m fitting my desirability polynomial directly on latitude and longitude. We will see how this goes.

I don’t think this was mentioned yet, but the lat, lon coordinates are very likely in the WGS 1984 coordinate reference system, which models the Earth as an ellipsoid. I know you’re not too concerned about absolute error, and counties that are relatively close may be fine using Haversine. But if you were to estimate the distance from, say, Seattle, Washington to Miami, Florida, the error may be higher than you wish. Do you happen to have altitude/elevation data as well? Some counties may be near 0 m above mean sea level (MSL) while others may be over 2000m MSL. I’m not sure what these factors do to the overall accuracy, but it may be worth testing the Haverine formula against something like Geodesy.euclidian_distance to see what the differences are.

As the name suggests, this computes the straight line distance, which is only accurate for small distances - see doc reference here.