Hi everyone:
some background:
I have a questions about data interpolation. I have some GPS station data which is missing data. This can occur for various reason (Software updates, hardware swaps, etc…). However, these gaps in data affect my analysis and visualizations.
For example, if you look at this figure below, you’ll notice a sharp spike in a few of the regions right around the 2005 mark. There are others too, as you can see, but the most egregious one is my focus right now with the hopes of correcting that will improve the others.
I’ll try to be as succinct as I can with the explanation of the process. My first step is to take a windowed average for each GPS station individually. My second step is to group each GPS station into unique bins based on their geographic location. Third step is to now take the average of each windowed average value for each GPS station contained within the bins.
Here is the problem:
For a few of the regions, at specific epochs, there are only a handful of unique data points; where sometimes there is only 1 stations active in a bin. Of the 371 stations used in my study, if there is only 1 station being used for the average, and this value happens to be an outlier compared to other regions, it creates a rather chaotic plot instead of being “smoothed out” by other stations in the bin.
My attempted solution:
I want to interpolate for the missing values to create a smoother plot. Right now I am trying to utilize the Interpolations.jl
package, but, It looks “off”. Now I don’t think it’s going to look “natural” or anything like that, but, I was wondering if anyone has any advice or insight for a more robust interpolation?
My current workflow (simplified):
(1) compare missing data/dates to a full list of dates:
all_years = readdlm("/path/to/file/julia/mjd_decyr.txt")[:, 1]
all_years = sort(all_years)
(2) join data to identify gaps for interpolation:
df_full = leftjoin(all_years, gps_df, on = :decyr, makeunique=true)
(3)
# set up interpolation function using existing data
interp_function = LinearInterpolation(df.decyr, df.value, extrapolation_bc=Line())
# Interpolate
interpolated_values = interp_function.(df_full.decyr)
From there I feed those values back into the DataFrame
and write to a file.
Here is the pre-interpolated data with the gaps:
And here is the interpolated plot:
The biggest issue the end of my plot that is completely skewed. Secondary is that the interpolation created a “step” in the data. Which, I can live with given that is the extent of what the Iterpolations.jl
package can do.
But, if anyone has better implementations they wouldn’t mind sharing, I would be extremely grateful!
Thank you in advance!