HTTP Streaming Antropic Claude AI

Marcell_Havlik · July 31, 2024, 8:23am

Hey Julianners,
Yesterday I stucked with the HTTP streaming functionality very well and just cannot figure out how to solve it. I suppose the problem is with the body in the POST message. Also I cannot do write(response, body) due to it also gives back error in the end. Also tried the response_stream

io=IOBuffer();
HTTP.post(url, headers, JSON.json(body); response_stream=io)

but in this case I also get all the response at once instead of one by one.

The example code is here:

using HTTP
using JSON

url = "https://api.anthropic.com/v1/messages"

body = Dict(
    "messages" => [
        Dict("content" => "Hi, tell me a very short story", "role" => "user"),
    ],
    "model" => "claude-3-5-sonnet-20240620",
    "max_tokens" => 256,
    "stream" => true,  # Set to true for streaming
)
headers = Dict(
    "content-type" => "application/json",
    "x-api-key" => ENV["ANTHROPIC_API_KEY"],
    "anthropic-version" => "2023-06-01",
)
function stream_response(url, headers, body)
    HTTP.open("POST", url, headers, body=body) do response
        @show response
        for line in eachline(response)
            @show line
            if startswith(line, "data: ")
                data = JSON.parse(line[6:end])
                if data["type"] == "content_block_delta" && data["delta"]["type"] == "text_delta"
                    println(data["delta"]["text"])
                    flush(stdout)
                end
            end
        end
    end
end
stream_response(url, headers, JSON.json(body))

It must be working as a curl command does just right:

curl https://api.anthropic.com/v1/messages \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --data \
'{
  "model": "claude-3-5-sonnet-20240620",
  "messages": [{"role": "user", "content": "Hello, tell me a very short story"}],
  "max_tokens": 256,
  "stream": true
}'

Any idea how to manage it then?

Marcell_Havlik · August 5, 2024, 2:28pm

Generated a workaround with AI. If someone would bump into this issue. (Also note that the data comes in larger batches this way than the curl version when it is ran naturally in the terminal.)

stream_anthropic_response(prompt::String; model::String="claude-3-5-sonnet-20240620", max_tokens::Int=256) = stream_anthropic_response([Dict("role" => "user", "content" => prompt)]; model, max_tokens)
function stream_anthropic_response(msgs::Vector{Dict{String,String}}; model::String="claude-3-5-sonnet-20240620", max_tokens::Int=256)
    body = JSON.json(Dict(
        "model" => model,
        "messages" => msgs,
        "max_tokens" => max_tokens,
        "stream" => true
    ))
    `curl -sS -N https://api.anthropic.com/v1/messages -X POST -H anthropic-version: 2023-06-01 -H content-type: application/json -H x-api-key: $(ENV["ANTHROPIC_API_KEY"]) -d $body`
    
    for line in eachline(pipeline(cmd))
        startswith(line, "data: ") || continue
        data = JSON.parse(replace(line, r"^data: " => ""))
        data == "[DONE]" && break
        
        if get(get(data, "delta", Dict()), "type", "") == "text_delta"
            text = data["delta"]["text"]
            print(text)
            flush(stdout)
        end
    end
end

I guess the larger batch are some kind of catching at some layer. Would be happy if someone would have idea about it. But of course the Opus 3.5 maybe will solve/answer this anyways ofc… xD

I also send the above issue to the HTTP.jl - github the opened: Issue
The HTTP.jl should work naturally I believe.

svilupp · August 7, 2024, 7:45am

I don’t think you can do it by newlines.

Have you looked at how OpenAI.jl implements it? It should be exactly the same: OpenAI.jl/src/OpenAI.jl at 762daadd3ad35ae1badd47fdc02796ca7bd6c886 · JuliaML/OpenAI.jl · GitHub

In short, you

open a streamed POST request in a do-block.
write your body.
close writer
open reader
iterate with while using eof or DONE signal
close reader

It’s same as here in HTTP docs: Client · HTTP.jl

Difference I see in your first implementation

passing body directly - I think it has to be passed into the stream
reading the stream safely readavailable (it doesn’t always come separated by new lines)
wait for DONE (might be OpenAI thing?)

Marcell_Havlik · August 8, 2024, 7:06pm

I was getting Error like:

ERROR: HTTP.RequestError:
HTTP.Request:
HTTP.Messages.Request:
"""
POST /v1/messages HTTP/1.1
anthropic-version: 2023-06-01
content-type: application/json
x-api-key: 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
Host: api.anthropic.com
Accept: */*
User-Agent: HTTP.jl/1.10.4
Accept-Encoding: gzip
Transfer-Encoding: chunked

[Message Body was streamed]"""Underlying error:
IOError: read: connection reset by peer (ECONNRESET)
...

But indeed in the OpenAI.jl they used the:
HTTP.closewrite(io) and HTTP.startread(io)

So with your guide the working code:

using HTTP
using JSON

body = Dict(
    "messages" => [
        Dict("content" => "Hi, tell me a longer story", "role" => "user"),
    ],
    "model" => "claude-3-5-sonnet-20240620",
    "max_tokens" => 256,
    "stream" => true,  # Set to true for streaming
)
headers = Dict(
    "content-type" => "application/json",
    "x-api-key" => ENV["ANTHROPIC_API_KEY"],
    "anthropic-version" => "2023-06-01",
)
HTTP.open("POST", "https://api.anthropic.com/v1/messages", headers; status_exception=false) do io
	write(io, JSON.json(body))
	HTTP.closewrite(io)    # indicate we're done writing to the request
	HTTP.startread(io) 
	while !eof(io)
		println(String(readavailable(io)))
	end
	HTTP.closeread(io)
end

Really THANK YOU for the help! Beautiful to see that this is also working perfectly in julia!

(Note: Sonnet 3.5 is overloaded like hell… so it is really hard to test and see if it works. But it works I know because It worked once when their server was up for a second. )

svilupp · August 8, 2024, 7:16pm

Do I see the Anthropic.jl forming?

Marcell_Havlik · August 9, 2024, 8:37am

Defnitely! I will make a really basic Anthropic.jl fast yes.

OR I mean the Anthropic Sonnet 3.5 will create it itself for “itself” XD

Marcell_Havlik · August 9, 2024, 10:38am

On the way. Hope it can be more helpful later on

ametalci · August 22, 2024, 5:38am

Thank you for this working example. I’ve been searching for days just to find how to get streaming output from Ollama in Julia

NOTE: Oh I’ve been using PromptingTools.jl of yours @svilupp

Topic		Replies	Views
Optimize Apache Arrow data streaming over HTTP Performance web , http , arrow	29	1364	October 12, 2024
HTTP Streaming Chunked JSON Web Stack	7	1350	June 22, 2022
HTTP Streamed Response BufferedInputStream(http::HTTP.Stream) Warning New to Julia http	1	245	November 3, 2023
Stream large file via HTTP Web Stack http , mmap , streaming	0	634	September 3, 2023
Help understanding which changes in HTTP.jl affect this (older) function in Couchzilla.jl. Trying to update the package to julia 1.0 and recent HTTP.jl Web Stack	2	498	March 4, 2021

HTTP Streaming Antropic Claude AI

Related topics