Is there a DataFrame that can be memory mapped to a file?

Just saw Make loading weights 10-100x faster on Hacker News

Is there something equivalent in Julia? Is there a way to store DataFrame in a file so it loads instantly via mmap?

ChatGPT suggests the following:

for (name, T) in zip(colnames, coltypes)
        data = Mmap.mmap(io, Vector{T}, (nrows,))
        df[!, name] = data
        seek(io, position(io) + sizeof(T) * nrows)
    end

Fyi, this is covered in subchapter 8.4 of @bkamins’s julia-for-data-analysis-book, where it is shown how data frames can be stored in different formats, including Apache arrow.

2 Likes

Yes, using Arrow.jl is a standard way to do it AFAICT.

But is there a way to update the dataframe using Arrow.jl. I would like to push! new rows to it.

Arrow.jl is read only.

1 Like

But is there a way to load huge amounts of data and update & add more data?

You would need to load it by chunks AFAICT