Streaming data represented on webpage dataframe or array?

If it is 20 columns that all are Float64 then vector (or named tuple) of vectors will be faster. However, 100 rows/second is not much so probably any data structure will work, e.g. with DataFrames.jl you have:

julia> @time foreach(_ -> push!(df, rand(20)), 1:1_000_000)
  2.952809 seconds (63.00 M allocations: 1.184 GiB, 16.02% gc time, 0.50% compilation time)

so as you can see in 3 seconds you can generate and add 1 million rows to a data frame.

And you can check how much time a single push like this take:

julia> using BenchmarkTools

julia> @benchmark push!($df, rand(20))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.256 μs …  20.722 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.489 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.661 μs ± 871.836 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▅▆▂
  █████▅▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.26 μs         Histogram: frequency by time        6.98 μs <

 Memory estimate: 1.17 KiB, allocs estimate: 62.

so most likely adding rows will not be a bottleneck in your use case.

(if you wanted to push millions of rows then it would matter)

3 Likes