Common data utilities for time series

I frequently find myself having to cleanup and process raw ‘real’ data. I am really not a data person, but I think this is called data plumbing. And to be honest, I don’t have much fun doing it :slight_smile:

Typical operations include:

  • Finding gaps in the data
  • Imputation
  • ffill / bfill / Interpolation
  • Outlier detection and correction
  • Resampling
  • Identifying usable sections in noisy data

Some low level functions for these operations are provided by packages like DSP.jl, DataFrames.jl, Interpolations.jl. But it takes some work to hook them up to your specific time-aware data source and format, and as a result the past few years I ended up writing multiple implementations of resample.

So ideally what I want is a very lightweight package that works directly on Tables.jl data sources, preferably lazily or in place with minimal copying. All you would need to know is: what is my data axis? What is my time axis? Which operations do I want to apply? And then just do it. That way we don’t have to maintain our own time-aware types or index columns, like TSFrames.jl and TimeSeries.jl do. It should also be trivial to stack multiple operations using for example Chain.jl, operate on multiple columns with multithreading, add new processing functions, either directly or through extensions.

Does such a package exist, and did I miss it? Should it exist? Any comments or ideas are welcome.

I’m almost sure you can do all that with GMT.jl. Some recipes are already high level implemented, others would probably need to dig in docs. See these examples

Thanks! That definitely has some useful features. I didn’t expect to find that in a package for geographic data. My idea was to have a package that is focused on tabular and time-series data.

GMT is Generic. Both geographic and Cartesian.

Give impute a try. I was quite impressed.

TableTransforms.jl has plenty of transforms for Tables.jl.