I have been using this code to perform lossy compression on oversampled time domain waveforms and decided to transform it into a package and make it available. (not registered yet)
The main idea behind is approximating the signal with line segments based on a fixed absolute threshold:
The code is as barebones as possible; initially I had some low pass filtering and wavelet denoising (might add them back in the future if needed).
I hope other people will benefit from this.
Cool! It looks really nice! Do you have the reference for the method used somewhere?
I didn’t find similar methods, but to be honest I didn’t search too much. I know there are papers on ECG compression with variable sample rate, but they analyse chunks of the signal and decide to decimate by a certain factor the whole chunk.(might add those in the future)
Interesting. It reminds me of using Gaussian Processes for efficient sampling of costly functions with heterogeneous noise. Something like this:
Not sure if this is useful, but a few years ago I came across the Ramer–Douglas–Peucker algorithm when researching geometry simplification techniques. That article has links and references to various related techniques.
For your specific use case, perhaps it’s not imperative that you use straight line segments in the simplified version; you might want to use a more general curve fitting method instead.
Thank you for the suggestions! It seems the method in my package is very similar to the Ramer–Douglas–Peucker algorithm and I didn’t even know it existed . The straight line segment approach came only as a consequence of compression speed. In the end, what the code does is it allocates an array corresponding to a straight line segment and subtracts the real chunk of signal in order to find the point where that error is at a maximum(it doesn’t even use the euclidean distance since it would use more than subtractions). That point will be stored as a key point of the compressed waveform.
Indeed, using more advanced piecewise curve fitting techniques can yield better results and even tackle the noise problem better, but at the expense of run time.
As a side note, the gif was just an example. The real world waveforms are much more complex than that, with sporadic sharp transients, with typical sizes of millions of points. (they are the (noisy) responses to battery fluctuations of various automotive electronic components)