HTTP Streaming Antropic Claude AI

Hey Julianners,
Yesterday I stucked with the HTTP streaming functionality very well and just cannot figure out how to solve it. I suppose the problem is with the body in the POST message. Also I cannot do write(response, body) due to it also gives back error in the end. Also tried the response_stream

io=IOBuffer();
HTTP.post(url, headers, JSON.json(body); response_stream=io)

but in this case I also get all the response at once instead of one by one.

The example code is here:

using HTTP
using JSON

url = "https://api.anthropic.com/v1/messages"

body = Dict(
    "messages" => [
        Dict("content" => "Hi, tell me a very short story", "role" => "user"),
    ],
    "model" => "claude-3-5-sonnet-20240620",
    "max_tokens" => 256,
    "stream" => true,  # Set to true for streaming
)
headers = Dict(
    "content-type" => "application/json",
    "x-api-key" => ENV["ANTHROPIC_API_KEY"],
    "anthropic-version" => "2023-06-01",
)
function stream_response(url, headers, body)
    HTTP.open("POST", url, headers, body=body) do response
        @show response
        for line in eachline(response)
            @show line
            if startswith(line, "data: ")
                data = JSON.parse(line[6:end])
                if data["type"] == "content_block_delta" && data["delta"]["type"] == "text_delta"
                    println(data["delta"]["text"])
                    flush(stdout)
                end
            end
        end
    end
end
stream_response(url, headers, JSON.json(body))

It must be working as a curl command does just right:

curl https://api.anthropic.com/v1/messages \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --data \
'{
  "model": "claude-3-5-sonnet-20240620",
  "messages": [{"role": "user", "content": "Hello, tell me a very short story"}],
  "max_tokens": 256,
  "stream": true
}'

Any idea how to manage it then?

1 Like

Generated a workaround with AI. If someone would bump into this issue. (Also note that the data comes in larger batches this way than the curl version when it is ran naturally in the terminal.)

stream_anthropic_response(prompt::String; model::String="claude-3-5-sonnet-20240620", max_tokens::Int=256) = stream_anthropic_response([Dict("role" => "user", "content" => prompt)]; model, max_tokens)
function stream_anthropic_response(msgs::Vector{Dict{String,String}}; model::String="claude-3-5-sonnet-20240620", max_tokens::Int=256)
    body = JSON.json(Dict(
        "model" => model,
        "messages" => msgs,
        "max_tokens" => max_tokens,
        "stream" => true
    ))
    `curl -sS -N https://api.anthropic.com/v1/messages -X POST -H anthropic-version: 2023-06-01 -H content-type: application/json -H x-api-key: $(ENV["ANTHROPIC_API_KEY"]) -d $body`
    
    for line in eachline(pipeline(cmd))
        startswith(line, "data: ") || continue
        data = JSON.parse(replace(line, r"^data: " => ""))
        data == "[DONE]" && break
        
        if get(get(data, "delta", Dict()), "type", "") == "text_delta"
            text = data["delta"]["text"]
            print(text)
            flush(stdout)
        end
    end
end

I guess the larger batch are some kind of catching at some layer. Would be happy if someone would have idea about it. But of course the Opus 3.5 maybe will solve/answer this anyways ofc… xD

I also send the above issue to the HTTP.jl - github the opened: Issue
The HTTP.jl should work naturally I believe.

I don’t think you can do it by newlines.

Have you looked at how OpenAI.jl implements it? It should be exactly the same: OpenAI.jl/src/OpenAI.jl at 762daadd3ad35ae1badd47fdc02796ca7bd6c886 · JuliaML/OpenAI.jl · GitHub

In short, you

  • open a streamed POST request in a do-block.
  • write your body.
  • close writer
  • open reader
  • iterate with while using eof or DONE signal
  • close reader

It’s same as here in HTTP docs: Client · HTTP.jl

Difference I see in your first implementation

  • passing body directly - I think it has to be passed into the stream
  • reading the stream safely readavailable (it doesn’t always come separated by new lines)
  • wait for DONE (might be OpenAI thing?)
2 Likes

I was getting Error like:

ERROR: HTTP.RequestError:
HTTP.Request:
HTTP.Messages.Request:
"""
POST /v1/messages HTTP/1.1
anthropic-version: 2023-06-01
content-type: application/json
x-api-key: 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
Host: api.anthropic.com
Accept: */*
User-Agent: HTTP.jl/1.10.4
Accept-Encoding: gzip
Transfer-Encoding: chunked

[Message Body was streamed]"""Underlying error:
IOError: read: connection reset by peer (ECONNRESET)
...

But indeed in the OpenAI.jl they used the:
HTTP.closewrite(io) and HTTP.startread(io)

So with your guide the working code:

using HTTP
using JSON

body = Dict(
    "messages" => [
        Dict("content" => "Hi, tell me a longer story", "role" => "user"),
    ],
    "model" => "claude-3-5-sonnet-20240620",
    "max_tokens" => 256,
    "stream" => true,  # Set to true for streaming
)
headers = Dict(
    "content-type" => "application/json",
    "x-api-key" => ENV["ANTHROPIC_API_KEY"],
    "anthropic-version" => "2023-06-01",
)
HTTP.open("POST", "https://api.anthropic.com/v1/messages", headers; status_exception=false) do io
	write(io, JSON.json(body))
	HTTP.closewrite(io)    # indicate we're done writing to the request
	HTTP.startread(io) 
	while !eof(io)
		println(String(readavailable(io)))
	end
	HTTP.closeread(io)
end

Really THANK YOU for the help! Beautiful to see that this is also working perfectly in julia!

(Note: Sonnet 3.5 is overloaded like hell… so it is really hard to test and see if it works. But it works I know because It worked once when their server was up for a second. )

2 Likes

Do I see the Anthropic.jl forming? :disguised_face:

1 Like

Defnitely! :smiley: I will make a really basic Anthropic.jl fast yes. :smiley:

OR I mean the Anthropic Sonnet 3.5 will create it itself for “itself” XD

1 Like

On the way. Hope it can be more helpful later on :slight_smile:

Thank you for this working example. I’ve been searching for days just to find how to get streaming output from Ollama in Julia :grinning:

NOTE: Oh I’ve been using PromptingTools.jl of yours @svilupp

1 Like