How to get remote data in newline terminating chunks?

sylvaticus · June 16, 2024, 3:36pm

I thought that this would end up in several some buffer-size chunks, but actually the 9GB file seems to be downloaded at once:

data_url = "https://zenodo.org/records/11549846/files/U2018_CLC2018_V2020_20u1.gpkg?download=1"
chunk_counter = 1
HTTP.open("GET", data_url) do io # Note the SSL support
    while !eof(io)
        global chunk_counter
        println(chunk_counter)
        data = String(read(io))
        chunk_counter += 1
    end
end

In this case it’s a binary data, but is there a way to stream a remote resource in chunks that are guaranteed to ends with a newline, so that I can process them with some online algorithm (i.e. train a ML model that supports multiple fitting ) ?

Topic		Replies	Views
HTTP Streaming Chunked JSON Web Stack	7	1400	June 22, 2022
HTTP.jl : response bodies truncated Web Stack	2	2023	August 31, 2017
HTTP Package General Usage question , web	2	1441	August 22, 2018
Read Vector{UInt8} lines from a gzipped file (optimization) General Usage question	3	593	October 1, 2017
Best way to download data from a remote URL directly into JuliaDB Performance	2	1912	April 25, 2018

How to get remote data in newline terminating chunks?

Related topics