I have a largish dataset of UK addresses in a dataframe. Each address has a grid reference locating it to the nearest metre on the British National Grid. I need to tag each address with the name of the 5km grid tile it lies in.
If/when I’ve read and understood the grid data, my plan of attack was
- Convert the grid reference to a geometry, like:
points = Point.(zip(df.EASTING, df.NORTHING))
newdf = georef(df, points)
geojoin
the address points to the grid tile geometry:
joineddf = geojoin(newdf, tilegeometry, kind=:left, pred=((g1, g2) -> intersects(g1, g2)))
This may be a bit of a sledge hammer to crack a nut, though. The grid tiles are systematically laid out. The key bit I don’t know is the two letter codes used for the 100km tiles (which are also the first two characters of the 5km tile names). Once the two letters are known, the digits are easily computable from the full grid reference, I think.
So I am now thinking that I just need to find the co-ordinates of the south-west corner of each 100km grid tile and the associated two-letter code for that tile. From that I can create a look-up from any grid location to the relevant 5km grid tile. I only need to do this once and then using the look-up ought to be faster than using geojoin
(although geojoin
seems quite fast and my code is generally very inefficient!)