I have a total of 1 million stations numbered from 1 to 1 million. For each of them, how do I quickly identify the rest of the stations (Station numbers) that are within 2 km of distance?

Any tips would be greatly appreciated. Many thanks.

I have a total of 1 million stations numbered from 1 to 1 million. For each of them, how do I quickly identify the rest of the stations (Station numbers) that are within 2 km of distance?

Any tips would be greatly appreciated. Many thanks.

How are the station positions defined? Which type of distance you want (euclidean, or something else)?

(The solution is probably in the NearestNeighbors package, or similar alternatives).

1 Like

Many thanks for the reply.

These stations are from the global surface ocean. Each of them have its own longitude and latitude information. The distance should be the closest point-to-point distance on a globe.

step 1 is to sort by laditude. then divide the points into 2km laditude bins and sort each bin by longitude. then for each point, you can get the list of possible close points by looking at 3 bins and doing a binary search to see which have close enough longitudes (this is latitude dependent). then for each candidate identified, you can perform a distance test to find the actual distance. Note that you will need a minor modification of this for the northern and southern most bins, but it is still relatively straightforward.

As described, this algorithm is `O(n*log(n)+close_pairs)`

but with a little care, could be dropped to `O(n+close_pairs)`

although as described it should be pretty fast already.

4 Likes

using the `NearestNeighbors.jl`

package you can do this: first convert the spherical coordinates into cartesian coordinates, adjust the cutoff to a 3D cutoff, and do:

```
using NearestNeighbors
using StaticArrays
function points_in_sphere(r,N)
p = SVector{3,Float64}[]
for ip in 1:N
θ = π*rand()
ϕ = 2π*rand()
x = r * sin(ϕ) * cos(θ)
y = r * sin(ϕ) * sin(θ)
z = r * cos(ϕ)
push!(p, SVector(x,y,z))
end
return p
end
function cartesian_cutoff(r, arch_cutoff)
angle = arch_cutoff / r
cutoff = r * sqrt(2 * (1-cos(angle)))
return cutoff
end
function main(;N=10^6)
r = 6371 # km - earth radius
arch_cutoff = 2 # km
p = points_in_sphere(r, N)
cutoff = cartesian_cutoff(r, arch_cutoff)
tree = BallTree(p)
inrange(tree, p, cutoff)
end
```

This takes 2s in my computer, thus probably solves the problem, although there may be smarter ways that do not involve converting the points to 3D space.

3 Likes

This is awesome! Thank you so much.