Streaming data represented on webpage dataframe or array?

anon69491625 · May 25, 2022, 2:38pm

Hi all
I have ZMQ streaming data and want to represent it using a webpage. The data has an update frequency of about 100 per second. It’s completely float64 and contains other information that represents the ROW, COLUMN, giving the VALUE ( float64) to be updated. There’d be about 100 rows and 20 float64 columns.

So I could use either a dataframe or an array with Dash/Genie. As the data could be updating an IN MEMORY structure which is the more efficient to use, an array or dataframe.

thank you

Thank you.

bkamins · May 25, 2022, 3:26pm

If it is 20 columns that all are Float64 then vector (or named tuple) of vectors will be faster. However, 100 rows/second is not much so probably any data structure will work, e.g. with DataFrames.jl you have:

julia> @time foreach(_ -> push!(df, rand(20)), 1:1_000_000)
  2.952809 seconds (63.00 M allocations: 1.184 GiB, 16.02% gc time, 0.50% compilation time)

so as you can see in 3 seconds you can generate and add 1 million rows to a data frame.

And you can check how much time a single push like this take:

julia> using BenchmarkTools

julia> @benchmark push!($df, rand(20))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.256 μs …  20.722 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.489 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.661 μs ± 871.836 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▅▆▂
  █████▅▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.26 μs         Histogram: frequency by time        6.98 μs <

 Memory estimate: 1.17 KiB, allocs estimate: 62.

so most likely adding rows will not be a bottleneck in your use case.

(if you wanted to push millions of rows then it would matter)

anon69491625 · May 25, 2022, 7:11pm

first of all thank you for your guidance, as always, it adds to my toolkit. It seems to me that you are building a dataframe and populating it, ending up with a static structure. That’s not my use case and. subject to reading the op, is my fault for failing to describe it properly.

I would be constructing a dynamic structure of 100 rows and 12 columns which. The first column of the structure are static values ( index) , the 12 float64 columns ( values) associated with the rows will change as the data comes in. So a zmq message comes in, and it’s used to update the existing relevant value by going to the row (index) column and updating the value.

It seems that because my memory footprint is so small ( 100 rows of 10 char strings) * 12 columns of float64 with floats being updated selectively at low frequency ( 100 / sec) it doesn’t matter what I pick.

would that be fair to say?

thank you for taking the time to help me ( as always)

theakson

bkamins · May 25, 2022, 8:39pm

Yes - with 100 ops/second the cost of management of such data structure in Julia should be negligible.

anon69491625 · May 26, 2022, 12:06am

Again, thank you so much for taking the time. I always learn something new. I am REALLY looking forward to your new book.

Topic		Replies	Views
How to share a dataframe across machines on a wired wan New to Julia question , dataframes	6	351	January 29, 2023
Performance: Fast way to access numbers in Dataframes or alternatives Performance dataframes , data_structures	12	1186	November 15, 2022
Construct Julia Dataframe from row data New to Julia question , dataframes , data_structures	11	6212	March 21, 2020
Mapping Vector{MyType} to a DataFrame General Usage	13	2825	June 3, 2019
Add incoming data to a DataFrame New to Julia	5	885	August 21, 2019

Streaming data represented on webpage dataframe or array?

Related topics