Does eachline loads whole stream/file in memory?

Maxim_Lubov · May 24, 2022, 4:30pm

My problem is following. I have a huge .csv file stored in S3 bucket. I don’t want to load it on disk, just read it line by line with the consecutive processing of each line. For the locally stored files I usually use CSV.Rows or for row in CSV.file(). However these options are unavailable in the case of S3 stored data. So, I use following approach:

open(s3path) do stream
        for line in eachline(stream)
            println(line)
        end
    end

My question is, does ```eachline load only one line into memory, so I could read huge .csv files without worrying about OOM problems?

ericphanson · May 24, 2022, 5:27pm

I think eachline doesn’t load more than it needs, but I think open(f, ::S3Path) does load the whole thing into memory, which is an issue with AWSS3: Support for streaming `S3Path` data · Issue #204 · JuliaCloud/AWSS3.jl · GitHub.

Topic		Replies	Views
Streaming CSV Reader from IO Type which does not load all Data into memory General Usage	2	398	December 1, 2020
Read large stream from STDIN General Usage	5	1932	February 20, 2021
CSVFiles line by line New to Julia	5	2698	December 29, 2018
Read multiple csv files from S3 New to Julia question , csv	4	2216	May 30, 2022
CSV.Row very slow for reading files line by line Performance package , csv	0	282	May 9, 2023

Does eachline loads whole stream/file in memory?

Related topics