Serialize or swap file?

andrey2185 · August 4, 2020, 9:49am

Hello!
When processing data that does not fit in RAM, what is more efficient - manual serialization or
swap file?

Splitting the algorithm into subtasks will not work, because at each stage of the calculations, any data from the available data may be needed

oheil · August 4, 2020, 10:05am

andrey2185 · August 4, 2020, 10:33am

thanks,

but I don’t want to write queries to a giant file or rewrite it when needed

it is better and clearer to have many typical files and two functions: serialize (), deserialize ()

but I would like not to think about this too)

johnh · August 4, 2020, 10:35am

On the hardware side of things, if you need more capacity consider ZRAM on Linux, which is using RAM as a compressed swap file

Also if you have the option of new hardware there is Intel Optane memory which acts as a slower, cheaper tier of memory

andrey2185 · August 4, 2020, 6:57pm

very cool, when deserializing a 5x1000000 Int32 jdf dataframe, the speed is 460 times faster than deserializing out of the box, how is this possible?

Topic		Replies	Views
DataFrames and serialization General Usage	0	297	July 11, 2019
JuliaDB out-of-memory computations New to Julia	2	515	December 6, 2018
Save and restore DataFrame, and serialize()/deserialize() General Usage	13	5376	September 13, 2019
De-Serialization Performance Data	12	1732	November 25, 2018
[ANN] JDF.jl - Experimental Julia DataFrames serialization format Package Announcements	3	1428	January 19, 2020