Fastest way to save a large number of DataFrames to disk

Inspired by this post (Writing Arrow files by column), I found a quite fast way to save a large number of dataframes into a single Arrow file in the following way

open(Arrow.Writer, "test1.arrow") do writer
    for df in values(df_dict_rand)
        Arrow.write(writer, df)

It completely skip the combining step. If I want to have a single dataframe, simply loading back the Arrow file, it is blazingly fast.


An even better approach is to use Tables.partitioner and Arrow.write which will will use multiple threads to write multiple record batches simultaneously (e.g. if julia is started with julia -t 8 or the JULIA_NUM_THREADS environment variable is set). (from the Arrow.jl doc)

parts = Tables.partitioner(values(df_dict_rand))
Arrow.write("test.arrow", parts)