Is there a DataFrame that can be memory mapped to a file?

Just saw Make loading weights 10-100x faster on Hacker News

Is there something equivalent in Julia? Is there a way to store DataFrame in a file so it loads instantly via mmap?

ChatGPT suggests the following:

for (name, T) in zip(colnames, coltypes)
        data = Mmap.mmap(io, Vector{T}, (nrows,))
        df[!, name] = data
        seek(io, position(io) + sizeof(T) * nrows)
    end

Fyi, this is covered in subchapter 8.4 of @bkamins’s julia-for-data-analysis-book, where it is shown how data frames can be stored in different formats, including Apache arrow.

2 Likes

Yes, using Arrow.jl is a standard way to do it AFAICT.