Accumulating distributed data

grahamas · August 28, 2019, 3:56pm

I’m running distributed simulations and hoping to save the results as I go – the aggregate results are too big to hold in memory. Can someone point me to a good way to do this that doesn’t amount to saving each run in a separate file? JuliaDB almost seems right, but the functionality for out-of-core processing seems to assume all the data already exists. So the closest I can come is saving each batch as a CSV (!) and then using loadtable to re-process the directory into a distributable binary directory.

Topic		Replies	Views
Ingesting data to JuliaDB without .csv files Data question	4	1287	August 30, 2018
How to handle and store large amounts of (distributed) generated data? Data	4	1447	April 15, 2017
JuliaDB, tutorial with large datasets and other questions General Usage tutorials	0	830	January 20, 2020
ANN: JuliaDB.jl Community	40	9707	November 13, 2018
Using JuliaDB to create larger than memory datasets and work with them? General Usage	3	1053	October 15, 2019

Accumulating distributed data

Related topics