Serialize or swap file?

Hello!
When processing data that does not fit in RAM, what is more efficient - manual serialization or
swap file?

Splitting the algorithm into subtasks will not work, because at each stage of the calculations, any data from the available data may be needed

Have you thought about
https://docs.julialang.org/en/v1/stdlib/Mmap/
?

1 Like

thanks,

but I don’t want to write queries to a giant file or rewrite it when needed

it is better and clearer to have many typical files and two functions: serialize (), deserialize ()

but I would like not to think about this too)

Please have a look at GitHub - xiaodaigh/JDF.jl: Julia DataFrames serialization format

On the hardware side of things, if you need more capacity consider ZRAM on Linux, which is using RAM as a compressed swap file

Also if you have the option of new hardware there is Intel Optane memory which acts as a slower, cheaper tier of memory

3 Likes

very cool, when deserializing a 5x1000000 Int32 jdf dataframe, the speed is 460 times faster than deserializing out of the box, how is this possible?