Matching filtered Vector{String} to non-filtered Matrix{Any} - GPS station names

GeodeticR · February 15, 2024, 3:21am

Hello all,

I’ve been attempting to match an array of GPS stations names that I’ve filtered such that I have a time frame I’m interested and that the percentage of collection in that time is > 93%.

The issue is, that I have a non-filtered Matrix containing all the GPS station names, longitudes, and latitudes.

Here is the overall code that I’ve implemented:

using HTTP, DelimitedFiles

midas = readdlm("midas.NA.txt")

filter1 = (midas[:,3] .<= 2001.5000) .& (midas[:,4] .>= 2006.0000)
lon_lat = midas[:, [1, 26, 25]]
midasfilter1 = midas[filter1, :]


writedlm("Stats1.txt", midasfilter1[:,1]) 

#2

function empty_set(url)
	try
		time_series = HTTP.get(url)
		
		stringy = String(time_series.body)
		
		linefilter = length(split(stringy, "\n"))
		
		return linefilter < 1700 
		
		catch e
			return true
	end
end

stats2_names = readdlm("Stats1.txt", String)

IGS14_url = "http://geodesy.unr.edu/gps_timeseries/tenv3/IGS14/"

midas_bytes = String[]

for station in stats2_names 

	station = strip(station)
	
	url1 = IGS14_url * "$station.tenv3"
	
	if !empty_set(url1)
		push!(midas_bytes, station)
	end
end

I’m still very new to Julia and this type of analysis (which I really haven’t even gotten to yet), but the non-filtered Matrix is here:

lon_lat
8749×3 Matrix{Any}:
 "1LSU"   -91.1803  30.4074
 "1NSU"   -93.0976  31.7508
 "1ULM"   -92.0759  32.529
 "299C"  -142.076   64.0289
 "3RIV"   -72.5761  46.3148
 "59WE"  -112.183   33.4311
 "5PTS"  -120.265   36.4292
 "7OAK"  -114.759   37.595
 "7ODM"  -117.093   34.1164

and the filtered stations that I need to match the longitude and latitudes of are here:

midas_bytes
650-element Vector{String}:
 "7ODM"
 "AGMT"
 "AHID"
 "AIS5"
 "AIS6"
 "ALAM"
 "ALBH"

All I want to do is create a new matrix that has the matched midas_bytes elements with their appropriate longitude and latitude from the lon_lat and write that as a text file that I can then plot.

In terms of steps I’ve taken to try and match the station names, I’ve done the following to make sure that the first column of each are actually outputting the stations names, which they are:

for (i,stations) in enumerate(lon_lat[:, 1])
           println("station $i: $stations")
               if i >= 100
               break
           end
       end
station 1: 1LSU
station 2: 1NSU
station 3: 1ULM
station 4: 299C
station 5: 3RIV
station 6: 59WE
station 7: 5PTS
station 8: 7OAK
station 9: 7ODM

and:

for (i,stations) in enumerate(midas_bytes)
    println("station $i: $stations")
        if i >= 100
        break
    end
end
station 1: 7ODM
station 2: AGMT
station 3: AHID
station 4: AIS5
station 5: AIS6
station 6: ALAM
station 7: ALBH
station 8: ALGO
station 9: AMC2
station 10: ANP5

Which, at least to me means they should be matching if I create some comparison between the two data sets?

But when I tried to implement the following potential solution I found on this site:

ll_f = filter(x -> x[1] in midas_bytes, lon_lat)

I would get the following:

Any[]

I tried various iterations of that same filter that included SplitApplyCombine, eachcol(), Base.Iterators and whatever other suggestion I found both on this site, and ChatGPT.

I’m sure there is some simple process that I’m just too green to come up with. But I’m at my wits end and have been banging my head against the wall for the last 2-days just trying to figure what, what I originally thought would be a quick step, other things I could do to get a new text file that has the filtered midas_bytes list with their corresponding lons and lats.

Any insight would be greatly appreciated. Thank you! I’ve attached the Julia code with all my notes with it in case it would be helpful to see my though process behind my work flow.

gps_iterate.jl (3.8 KB)

The text file from the beginning of the script is here: midas.NA.txt

A grateful first year grad student.

nilshg · February 15, 2024, 10:14am

I might misunderstand but aren’t you just trying a leftjoin here? There are more lightweight options available but the simplest is probably just to use a standard DataFrames workflow:

julia> using DataFrames

julia> lldf = DataFrame(lon_lat, [:station, :lon, :lat])
9×3 DataFrame
 Row │ station  lon       lat
     │ Any      Any       Any
─────┼────────────────────────────
   1 │ 1LSU     -91.1803  30.4074
   2 │ 1NSU     -93.0976  31.7508
   3 │ 1ULM     -92.0759  32.529
   4 │ 299C     -142.076  64.0289
   5 │ 3RIV     -72.5761  46.3148
   6 │ 59WE     -112.183  33.4311
   7 │ 5PTS     -120.265  36.4292
   8 │ 7OAK     -114.759  37.595
   9 │ 7ODM     -117.093  34.1164

julia> mdf = DataFrame(station = midas_bytes);

julia> leftjoin(mdf, lldf, on = :station)
7×3 DataFrame
 Row │ station  lon       lat
     │ String   Any       Any
─────┼────────────────────────────
   1 │ 7ODM     -117.093  34.1164
   2 │ AGMT     missing   missing
   3 │ AHID     missing   missing
   4 │ AIS5     missing   missing
   5 │ AIS6     missing   missing
   6 │ ALAM     missing   missing
   7 │ ALBH     missing   missing

GeodeticR · February 15, 2024, 4:35pm

Man, can’t believe I didn’t stumble across somethingike this earlier! Or maybe I did and glanced right over it.

Once I join the the two, would I be able to apply a similar process that I did above to get rid of the “missing” station names? Otherwise I’m going to have around 8000 lines of no use. Lol

Thank you!

Jeff_Emanuel · February 15, 2024, 5:07pm

https://dataframes.juliadata.org/stable/lib/functions/#Base.filter

GeodeticR · February 15, 2024, 5:18pm

Thank you much! I knew it would be something simple.

simsurace · February 15, 2024, 5:31pm

There is also dropmissing

GeodeticR · February 15, 2024, 6:22pm

Thank you! Lots of things to add to my repertoire. I really appreciate the helpfulness of the community.

GeodeticR · February 15, 2024, 6:51pm

Nevermind! Looks like once I applied the left join it automatically omitted the “missing” lines here. I hate how straightforward that is, and yet, I was not clever enough to find it on my own! Thank you again. You saved me. lol

leftjoin(gps_df, lldf, on = :station)
650×3 DataFrame
 Row │ station  lon       lat
     │ String   Any       Any
─────┼────────────────────────────
   1 │ 7ODM     -117.093  34.1164
   2 │ AGMT     -116.429  34.5943
   3 │ AHID     -111.064  42.7731
   4 │ AIS5     -131.6    55.0691
   5 │ AIS6     -131.599  55.0689
   6 │ ALAM     -115.158  37.358
   7 │ ALBH     -123.487  48.3898
   8 │ ALGO     -78.0714  45.9558
   9 │ AMC2     -104.525  38.8031

Jeff_Emanuel · February 15, 2024, 6:56pm

GeodeticR · February 15, 2024, 7:44pm

Awesome, and it is available at my university library. Appreciate the recommendation!

Topic		Replies	Views
How to filter columns of a matrix by columns of another matrix Data question	28	2033	April 12, 2021
Iterative Filter Loop Help: New to Julia question , loops	3	107	May 9, 2024
Combine multiple Arrays into one large dataframe matrix: New to Julia question , dataframes	3	389	April 24, 2024
DataFrames: obtaining the subset of rows by a set of values New to Julia dataframes	45	24043	April 27, 2024
How to filter out missings (using DataFramesMeta @where) New to Julia dataframes	2	692	July 22, 2019

Matching filtered Vector{String} to non-filtered Matrix{Any} - GPS station names

Related topics