Streaming CSV Reader from IO Type which does not load all Data into memory

I am looking for a way to read from a stream of CSV Rows, i.e. from an IO (or LibuvStream) type without reading the whole table into memory.

This currently works e.g. with CSV.jl’s CSV.Rows when reading a file since it is read as a memory mapped file.

However I would like to read row by row from an arbitrary IO input (such as streaming from s3). Currently all packages seem to call Base.read(io) in these cases.

Couldn’t you read the stream row by row into io and then periodically pass that io to CSV.jl for parsing? Sort of a roll your own batched processing approach

2 Likes

Also it may be worth pointing out that if you’re reading over a network connection, you will probably get better results by reading a fairly large chunk at a time rather than trying to go like by line.

3 Likes