Hi all
I have ZMQ streaming data and want to represent it using a webpage. The data has an update frequency of about 100 per second. It’s completely float64 and contains other information that represents the ROW, COLUMN, giving the VALUE ( float64) to be updated. There’d be about 100 rows and 20 float64 columns.
So I could use either a dataframe or an array with Dash/Genie. As the data could be updating an IN MEMORY structure which is the more efficient to use, an array or dataframe.
thank you
Thank you.
If it is 20 columns that all are Float64
then vector (or named tuple) of vectors will be faster. However, 100 rows/second is not much so probably any data structure will work, e.g. with DataFrames.jl you have:
julia> @time foreach(_ -> push!(df, rand(20)), 1:1_000_000)
2.952809 seconds (63.00 M allocations: 1.184 GiB, 16.02% gc time, 0.50% compilation time)
so as you can see in 3 seconds you can generate and add 1 million rows to a data frame.
And you can check how much time a single push like this take:
julia> using BenchmarkTools
julia> @benchmark push!($df, rand(20))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.256 μs … 20.722 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.489 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.661 μs ± 871.836 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▅▆▂
█████▅▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
2.26 μs Histogram: frequency by time 6.98 μs <
Memory estimate: 1.17 KiB, allocs estimate: 62.
so most likely adding rows will not be a bottleneck in your use case.
(if you wanted to push millions of rows then it would matter)
3 Likes
first of all thank you for your guidance, as always, it adds to my toolkit. It seems to me that you are building a dataframe and populating it, ending up with a static structure. That’s not my use case and. subject to reading the op, is my fault for failing to describe it properly.
I would be constructing a dynamic structure of 100 rows and 12 columns which. The first column of the structure are static values ( index) , the 12 float64 columns ( values) associated with the rows will change as the data comes in. So a zmq message comes in, and it’s used to update the existing relevant value by going to the row (index) column and updating the value.
It seems that because my memory footprint is so small ( 100 rows of 10 char strings) * 12 columns of float64 with floats being updated selectively at low frequency ( 100 / sec) it doesn’t matter what I pick.
would that be fair to say?
thank you for taking the time to help me ( as always)
theakson
Yes - with 100 ops/second the cost of management of such data structure in Julia should be negligible.
Again, thank you so much for taking the time. I always learn something new. I am REALLY looking forward to your new book.