Skipping a lot of lines in CSV.read() allocates too much memory

aris · February 18, 2024, 2:16pm

@rafael.guerra Here is the code I ended up using (which I corrected slightly from the original post).

using CSV, DataFrames

open(filepath) do f
    while !eof(f)
        readuntil(f, "ITEM: BOX BOUNDS")
        boxsize = CSV.read(IOBuffer(readuntil(f, "ITEM: ATOMS")), DataFrame, header=false, skipto=2)
        positions = CSV.read(IOBuffer(readuntil(f, "ITEM: ")), DataFrame, header=false, skipto=2)

    end
end

@djholiver Here is a link to a file using the lammps format. For future reference, this is a OneDrive link and it will expire in one month if I’m not mistaken, sorry but I don’t know if I could keep the link active forever.

As for this comment, that’s just genuine appreciation for everyone who provides me with the tools to do my job in an efficient way, for free. I wouldn’t dare ask for something that’s more specific to the lammps format, I think what’s available already would be more than enough.
This is why I made the original post using a generic MRE (here’s the general solution, for anyone who’s lost with all the posts in this thread), to find something that can be used by anyone with a similar problem.

Nevertheless, I really do appreciate how you guys offer to help with my specific example.

Topic		Replies	Views
Why DataFrames v.0.21.2 (julia v1.4.2) requires more memory than the previous version Performance dataframes	22	2283	June 29, 2020
.csv number of rows Data csv	6	3294	September 13, 2022
Reading a few rows from a BIG CSV file General Usage dataframes , csv , big-data	39	4535	January 18, 2024
How can I split large data using a faster and more efficient function (data science)? New to Julia csv	9	805	October 27, 2022
CSV.Row very slow for reading files line by line Performance package , csv	0	282	May 9, 2023

Skipping a lot of lines in CSV.read() allocates too much memory

Related topics