HTTP Streaming Chunked JSON

Hi

I receive streamed new-line delimited JSON data from a REST API.

The data is streamed in chunks and full line of JSON may span multiple chunks. So, I need to somehow buffer the chunks as they arrive until I have a full line to read.

I have been reading the HTTP documentation but I don’t fully understand how to create an IO buffer for the received chunks.

I think the code should look something like the following:

url = ...
headers = ...

io = ??? # Help Required
with HTTP.open("GET", url, headers) do io
  for line in eachline(io)
    println(line)
  end
end

Any suggestions or help would be very much appreciated.

Kind regards

I haven’t verified that this works, but I think something like the following should work:

findnewline(bytes) = something(findfirst(==(UInt8('\n')), bytes), 0)
HTTP.open("GET", url, headers) do io
    conn = io.stream
    for bytes in readuntil(conn, findnewline)
        x = JSON3.read(bytes)
        # so stuff with parsed JSON object `x`
    end
end

We could probably define readuntil on the Stream directly so you don’t have to reach in and get the Connection object.

Let me know if you run into issues; I’m working on improving the HTTP.jl docs as we speak, so I’ll try to include something like this if it works for you.

Thank you Jacob

I’ll test it and get back to you shortly.

John

I looked into things a bit more and I think it may not work quite right because the HTTP.Stream isn’t getting updated with the # of bytes being read, so it may not know when it’s “done” properly. That said, you may want to refactor the loop to check eof(conn) to wait until the underlying connection closes. I’ve opened an issue to properly support this on the Stream object itself.

Thank you Jacob

The following appears to work.

HTTP.open("GET", url, headers) do io
  for line in eachline(io)
    println(line)
  end
end

Although, I do get a warning about reading single bytes being inefficient.

Yeah, it’s not ideal and will be a little inefficient. I’ve implemented proper readuntil support on HTTP#master here. Hoping to get a new release out with that functionality very soon.

1 Like

In case others come across this thread, the new release has been made (1.0) for HTTP.jl that includes this functionality, so the right way to achieve the original request is to do:

findnewline(bytes) = something(findfirst(==(UInt8('\n')), bytes), 0)
HTTP.open("GET", url, headers) do io
    for bytes in readuntil(io, findnewline)
        x = JSON3.read(bytes)
        # do stuff with parsed JSON object `x`
    end
end

Thank you Jacob.