Hello!
When processing data that does not fit in RAM, what is more efficient - manual serialization or
swap file?
Splitting the algorithm into subtasks will not work, because at each stage of the calculations, any data from the available data may be needed
oheil
2
1 Like
thanks,
but I don’t want to write queries to a giant file or rewrite it when needed
it is better and clearer to have many typical files and two functions: serialize (), deserialize ()
but I would like not to think about this too)
johnh
4
Please have a look at GitHub - xiaodaigh/JDF.jl: Julia DataFrames serialization format
On the hardware side of things, if you need more capacity consider ZRAM on Linux, which is using RAM as a compressed swap file
Also if you have the option of new hardware there is Intel Optane memory which acts as a slower, cheaper tier of memory
3 Likes
very cool, when deserializing a 5x1000000 Int32 jdf dataframe, the speed is 460 times faster than deserializing out of the box, how is this possible?