Matching filtered Vector{String} to non-filtered Matrix{Any} - GPS station names

Hello all,

I’ve been attempting to match an array of GPS stations names that I’ve filtered such that I have a time frame I’m interested and that the percentage of collection in that time is > 93%.

The issue is, that I have a non-filtered Matrix containing all the GPS station names, longitudes, and latitudes.

Here is the overall code that I’ve implemented:

using HTTP, DelimitedFiles

midas = readdlm("midas.NA.txt")

filter1 = (midas[:,3] .<= 2001.5000) .& (midas[:,4] .>= 2006.0000)
lon_lat = midas[:, [1, 26, 25]]
midasfilter1 = midas[filter1, :]

writedlm("Stats1.txt", midasfilter1[:,1]) 


function empty_set(url)
		time_series = HTTP.get(url)
		stringy = String(time_series.body)
		linefilter = length(split(stringy, "\n"))
		return linefilter < 1700 
		catch e
			return true

stats2_names = readdlm("Stats1.txt", String)

IGS14_url = ""

midas_bytes = String[]

for station in stats2_names 

	station = strip(station)
	url1 = IGS14_url * "$station.tenv3"
	if !empty_set(url1)
		push!(midas_bytes, station)

I’m still very new to Julia and this type of analysis (which I really haven’t even gotten to yet), but the non-filtered Matrix is here:

8749×3 Matrix{Any}:
 "1LSU"   -91.1803  30.4074
 "1NSU"   -93.0976  31.7508
 "1ULM"   -92.0759  32.529
 "299C"  -142.076   64.0289
 "3RIV"   -72.5761  46.3148
 "59WE"  -112.183   33.4311
 "5PTS"  -120.265   36.4292
 "7OAK"  -114.759   37.595
 "7ODM"  -117.093   34.1164

and the filtered stations that I need to match the longitude and latitudes of are here:

650-element Vector{String}:

All I want to do is create a new matrix that has the matched midas_bytes elements with their appropriate longitude and latitude from the lon_lat and write that as a text file that I can then plot.

In terms of steps I’ve taken to try and match the station names, I’ve done the following to make sure that the first column of each are actually outputting the stations names, which they are:

for (i,stations) in enumerate(lon_lat[:, 1])
           println("station $i: $stations")
               if i >= 100
station 1: 1LSU
station 2: 1NSU
station 3: 1ULM
station 4: 299C
station 5: 3RIV
station 6: 59WE
station 7: 5PTS
station 8: 7OAK
station 9: 7ODM


for (i,stations) in enumerate(midas_bytes)
    println("station $i: $stations")
        if i >= 100
station 1: 7ODM
station 2: AGMT
station 3: AHID
station 4: AIS5
station 5: AIS6
station 6: ALAM
station 7: ALBH
station 8: ALGO
station 9: AMC2
station 10: ANP5

Which, at least to me means they should be matching if I create some comparison between the two data sets?

But when I tried to implement the following potential solution I found on this site:

ll_f = filter(x -> x[1] in midas_bytes, lon_lat)

I would get the following:


I tried various iterations of that same filter that included SplitApplyCombine, eachcol(), Base.Iterators and whatever other suggestion I found both on this site, and ChatGPT.

I’m sure there is some simple process that I’m just too green to come up with. But I’m at my wits end and have been banging my head against the wall for the last 2-days just trying to figure what, what I originally thought would be a quick step, other things I could do to get a new text file that has the filtered midas_bytes list with their corresponding lons and lats.

Any insight would be greatly appreciated. Thank you! I’ve attached the Julia code with all my notes with it in case it would be helpful to see my though process behind my work flow.

gps_iterate.jl (3.8 KB)

The text file from the beginning of the script is here: midas.NA.txt

  • A grateful first year grad student. :melting_face:

I might misunderstand but aren’t you just trying a leftjoin here? There are more lightweight options available but the simplest is probably just to use a standard DataFrames workflow:

julia> using DataFrames

julia> lldf = DataFrame(lon_lat, [:station, :lon, :lat])
9×3 DataFrame
 Row │ station  lon       lat
     │ Any      Any       Any
   1 │ 1LSU     -91.1803  30.4074
   2 │ 1NSU     -93.0976  31.7508
   3 │ 1ULM     -92.0759  32.529
   4 │ 299C     -142.076  64.0289
   5 │ 3RIV     -72.5761  46.3148
   6 │ 59WE     -112.183  33.4311
   7 │ 5PTS     -120.265  36.4292
   8 │ 7OAK     -114.759  37.595
   9 │ 7ODM     -117.093  34.1164

julia> mdf = DataFrame(station = midas_bytes);

julia> leftjoin(mdf, lldf, on = :station)
7×3 DataFrame
 Row │ station  lon       lat
     │ String   Any       Any
   1 │ 7ODM     -117.093  34.1164
   2 │ AGMT     missing   missing
   3 │ AHID     missing   missing
   4 │ AIS5     missing   missing
   5 │ AIS6     missing   missing
   6 │ ALAM     missing   missing
   7 │ ALBH     missing   missing

Man, can’t believe I didn’t stumble across somethingike this earlier! Or maybe I did and glanced right over it.

Once I join the the two, would I be able to apply a similar process that I did above to get rid of the “missing” station names? Otherwise I’m going to have around 8000 lines of no use. Lol

Thank you!

1 Like

Thank you much! I knew it would be something simple.

There is also dropmissing

1 Like

Thank you! Lots of things to add to my repertoire. I really appreciate the helpfulness of the community.

Nevermind! Looks like once I applied the left join it automatically omitted the “missing” lines here. I hate how straightforward that is, and yet, I was not clever enough to find it on my own! Thank you again. You saved me. lol

leftjoin(gps_df, lldf, on = :station)
650×3 DataFrame
 Row │ station  lon       lat
     │ String   Any       Any
   1 │ 7ODM     -117.093  34.1164
   2 │ AGMT     -116.429  34.5943
   3 │ AHID     -111.064  42.7731
   4 │ AIS5     -131.6    55.0691
   5 │ AIS6     -131.599  55.0689
   6 │ ALAM     -115.158  37.358
   7 │ ALBH     -123.487  48.3898
   8 │ ALGO     -78.0714  45.9558
   9 │ AMC2     -104.525  38.8031

Awesome, and it is available at my university library. Appreciate the recommendation!