TimeSeries Database

I have a process (websocket) that push data into a channel on realtime (couple of item every second). Each data points has a timestamp, group columns, and additional fields. I want to store this data on continuous basis in a table or database partition by hour and group col. Since, this is a TimeSeries database i want to store this data into column base database.

My search lead me to JuliaDB.jl package which seems to be not maintained. Is there any other package for this task? Also, Please suggest how can i store data from channel to partitions data by hour.

Note that TimeArrays (TimeSeries’ data type) are immutable so it’s relatively expensive to append data. The whole TimeArray will be copied for each operation.

I have a similar application and push new data to a regular Julia array and the accompanying timestamp to a separat DateTime vector. If you want to store the data for later you could move it to a TimeArray every hour.

Have you looked into DataStructures.jl? There are things like ring buffers that may help if you want to keep a fixed number of records.

I would just use DataFrames and suggest having Date and Time parts separate columns. When doing analysis, you can easily convert a DataFrame to TimeArray if you wish.

As for storage format, probably Parquet? See Parquet2.jl for that.

Are you looking for persistent storage between Julia sessions? If that’s the case, then it sounds like you need an actual database, not a Julia package.

If you are looking for a package that provides a time series object (table for time series data) you can check TSFrames.jl.

If you need just a storage I think you need a database. For sure there are many packages that serves as an interface to popular databases.

I think you need a separate database not just a Julia package. If you are not working with TBs of data questDB is good enough, it does support “influxDB line protocol”, in which there is a TCP port open and you just send your massages to there in a structured way after you parsed it in Julia.You can also partition by hour. After that you load your data into your Julia session whenever you want to do analysis.

See if something like OnlineStats.jl with a TSFrame solve your problem?

The first obvious answer to this kind of needs that immediately pops up - use sqlite (:
sqlite is probably the most well-tested software in use, and can easily cope with this kind of insertion bandwidth.

From Julia — Home · SQLite.jl is a direct interface to sqlite. Also see (my) SQLStore.jl package if you prefer an interface closer to Julia collections.

1 Like

You might also want to look at DuckDB, which should be well suited for that kind of use case, is easy to use, and has a julia interface.

1 Like