ANN: OnlineStats 1.0

Just this week, I’ve released 1.0 versions of OnlineStatsBase and OnlineStats. The OnlineStats interface has been stable for quite some time and I think 1.0 versions were overdue.

So please check it out, try it, open issues, and hit me up with questions!

35 Likes

Hi!

Can you please clarify its use case, is it only for large data that do not fit in memory, so if it does, I do not need to use it? Or also some visual trend compression tools?

Is this package intended only for a cumulative (growing data) statistics, or it can be also used for sliding window stats, like this package: https://github.com/JeffreySarnoff/RollingFunctions.jl ?

The main use case is bigger-than-memory data, but OnlineStats is also a natural solution for when you are generating or observing data on the fly and you’re only interested in statistics, not the data itself.

Or also some visual trend compression tools?

Yes, OnlineStats has a few novel techniques/data structures for data viz. Note that data is often “too big” to plot much before it’s “too big” for your laptop to load into memory.

Is this package intended only for a cumulative (growing data) statistics, or it can be also used for sliding window stats, like this package: https://github.com/JeffreySarnoff/RollingFunctions.jl ?

Sort of. You can add exponential weighting to just about anything, which behaves about the same as a sliding window. For example:

Mean(weight = ExponentialWeight(.01))
2 Likes

Talking about visual trend compression, I’m working with long-term time-series (like days of 1000Hz signals), and there are similar techniques, like you’re interested not only about data itself, but in some processing results, and for different time scales you have different processed “layers” of data. And to observe that data, you should have several plots with different time-scales, linked to each other as a slider, see for example: Interactive plot, acting as a range slider tool for another plot?

For the most frequent (big) data you can plot small time window, that is loaded interactively when you navigate to another time, so you don’t have to store the whole data in memory.
And for sparse (compressed) data you have big time window, so you can see a whole picture and have a detailed look for any small time window, if needed.