Just saw Make loading weights 10-100x faster on Hacker News
Is there something equivalent in Julia? Is there a way to store DataFrame in a file so it loads instantly via mmap?
Just saw Make loading weights 10-100x faster on Hacker News
Is there something equivalent in Julia? Is there a way to store DataFrame in a file so it loads instantly via mmap?
ChatGPT suggests the following:
for (name, T) in zip(colnames, coltypes)
data = Mmap.mmap(io, Vector{T}, (nrows,))
df[!, name] = data
seek(io, position(io) + sizeof(T) * nrows)
end
Fyi, this is covered in subchapter 8.4 of @bkamins’s julia-for-data-analysis-book, where it is shown how data frames can be stored in different formats, including Apache arrow.
Yes, using Arrow.jl is a standard way to do it AFAICT.
But is there a way to update the dataframe using Arrow.jl. I would like to push!
new rows to it.
Arrow.jl is read only.
But is there a way to load huge amounts of data and update & add more data?
You would need to load it by chunks AFAICT