Not possible to create a post request with more than 268 strings for create_embeddings() of OpenAI.jl?

Hi there.

It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now with OpenAI.jl. Is it an issue with the package or am I missing something? I already opened an issue, but I thought somebody would know whether there is something wrong I am doing.

Thanks!

using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

r = create_embeddings(
        ENV["OPENAI_API_KEY"],
        horror_movies.overview[1:268]
    )

Here is the error message i get:

caused by: HTTP.Exceptions.StatusError(400, "POST", "/v1/embeddings", HTTP.Messages.Response:
"""
HTTP/1.1 400 Bad Request
Date: Sat, 19 Aug 2023 16:40:11 GMT
Content-Type: application/json
Content-Length: 214
Connection: keep-alive
access-control-allow-origin: *
openai-organization: user-v5inpyrhmnlaw9xjmb03gcs8
openai-processing-ms: 29
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 3000
x-ratelimit-limit-tokens: 1000000
x-ratelimit-remaining-requests: 2999
x-ratelimit-remaining-tokens: 969809
x-ratelimit-reset-requests: 20ms
x-ratelimit-reset-tokens: 1.811s
x-request-id: 748d86208d09d59ee09f9fdb36b5ba27
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: 7f93d638bed96f3d-ATH
alt-svc: h3=":443"; ma=86400

{
  "error": {
    "message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}
""")
Stacktrace:
  [1] (::HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}})(stream::HTTP.Streams.Stream{HTTP.Messages.Response, HTTP.Connections.Connection{OpenSSL.SSLStream}}; status_exception::Bool, timedout::Nothing, logerrors::Bool, logtag::Nothing, kw::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ HTTP.ExceptionRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/ExceptionRequest.jl:19
  [2] exceptions
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/ExceptionRequest.jl:13 [inlined]
  [3] (::HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}})(stream::HTTP.Streams.Stream{HTTP.Messages.Response, HTTP.Connections.Connection{OpenSSL.SSLStream}}; readtimeout::Int64, logerrors::Bool, logtag::Nothing, kw::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ HTTP.TimeoutRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/TimeoutRequest.jl:17
  [4] (::HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}})(req::HTTP.Messages.Request; proxy::Nothing, socket_type::Type, socket_type_tls::Type, readtimeout::Int64, connect_timeout::Int64, logerrors::Bool, logtag::Nothing, kw::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ HTTP.ConnectionRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/ConnectionRequest.jl:120
  [5] (::Base.var"#90#92"{Base.var"#90#91#93"{ExponentialBackOff, HTTP.RetryRequest.var"#2#5"{Int64, typeof(HTTP.RetryRequest.FALSE), HTTP.Messages.Request, Base.RefValue{Int64}}, HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}})(args::HTTP.Messages.Request; kwargs::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ Base ./error.jl:296
  [6] (::HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}})(req::HTTP.Messages.Request; retry::Bool, retries::Int64, retry_delays::ExponentialBackOff, retry_check::Function, retry_non_idempotent::Bool, kw::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ HTTP.RetryRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/RetryRequest.jl:75
  [7] manageretries
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/RetryRequest.jl:30 [inlined]
  [8] (::HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}})(req::HTTP.Messages.Request; cookies::Bool, cookiejar::HTTP.Cookies.CookieJar, kw::Base.Pairs{Symbol, Union{Nothing, Int64}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :decompress, :verbose), Tuple{Nothing, Nothing, Int64}}})
    @ HTTP.CookieRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/CookieRequest.jl:42
  [9] managecookies
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/CookieRequest.jl:19 [inlined]
 [10] (::HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}})(req::HTTP.Messages.Request; iofunction::Nothing, decompress::Nothing, basicauth::Bool, detect_content_type::Bool, canonicalize_headers::Bool, kw::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:verbose,), Tuple{Int64}}})
    @ HTTP.HeadersRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/HeadersRequest.jl:71
 [11] defaultheaders
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/HeadersRequest.jl:14 [inlined]
 [12] (::HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}})(req::HTTP.Messages.Request; redirect::Bool, redirect_limit::Int64, redirect_method::Nothing, forwardheaders::Bool, response_stream::Nothing, kw::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:verbose,), Tuple{Int64}}})
    @ HTTP.RedirectRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/RedirectRequest.jl:25
 [13] redirects
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/RedirectRequest.jl:14 [inlined]
 [14] (::HTTP.MessageRequest.var"#makerequest#3"{HTTP.MessageRequest.var"#makerequest#1#4"{HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}}}})(method::String, url::URIs.URI, headers::Vector{Pair{String, String}}, body::IOBuffer; copyheaders::Bool, response_stream::Nothing, http_version::HTTP.Strings.HTTPVersion, verbose::Int64, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP.MessageRequest ~/.julia/packages/HTTP/nn2yB/src/clientlayers/MessageRequest.jl:35
 [15] makerequest
    @ ~/.julia/packages/HTTP/nn2yB/src/clientlayers/MessageRequest.jl:24 [inlined]
 [16] request(stack::HTTP.MessageRequest.var"#makerequest#3"{HTTP.MessageRequest.var"#makerequest#1#4"{HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}}}}, method::String, url::String, h::Vector{Pair{String, String}}, b::IOBuffer, q::Nothing; headers::Vector{Pair{String, String}}, body::IOBuffer, query::Nothing, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HTTP ~/.julia/packages/HTTP/nn2yB/src/HTTP.jl:457
 [17] request(stack::Function, method::String, url::String, h::Vector{Pair{String, String}}, b::IOBuffer, q::Nothing)
    @ HTTP ~/.julia/packages/HTTP/nn2yB/src/HTTP.jl:455
 [18] #request#20
    @ ~/.julia/packages/HTTP/nn2yB/src/HTTP.jl:315 [inlined]
 [19] request (repeats 2 times)
    @ ~/.julia/packages/HTTP/nn2yB/src/HTTP.jl:313 [inlined]
 [20] #request_body#3
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:80 [inlined]
 [21] request_body
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:78 [inlined]
 [22] _request(api::String, provider::OpenAI.OpenAIProvider, api_key::String; method::String, http_kwargs::NamedTuple{(), Tuple{}}, streamcallback::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:model, :input), Tuple{String, Vector{String}}}})
    @ OpenAI ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:136
 [23] _request
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:126 [inlined]
 [24] #openai_request#12
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:163 [inlined]
 [25] openai_request
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:161 [inlined]
 [26] #create_embeddings#20
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:338 [inlined]
 [27] create_embeddings
    @ ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:337 [inlined]
 [28] create_embeddings(api_key::String, input::Vector{String})
    @ OpenAI ~/.julia/packages/OpenAI/jZ9Qc/src/OpenAI.jl:337
 [29] top-level scope
    @ ~/.julia/dev/AssociationMetrics/src/associations/Untitled-1.jl:87

Although horror_movies.overview is a string vector…

I tried different vector sizes and it seems there is no hard upper bound for the string vector size. At some point, I managed to get 700 strings of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?

However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:

library(tidyverse)

set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
  filter(!is.na(overview), original_language == "en") %>%
  slice_sample(n = 1000)

library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)

resp <- POST(
  embeddings_url,
  auth,
  body = body,
  encode = "json"
)

embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("data", "embedding")

The culprit is that some of your string elements are empty (check index horror_movies.overview[6], for example).

This should do the trick:

import Downloads
using CSV, DataFrames, OpenAI

horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

# 298 reviews after removing empties
input = filter(!isempty, horror_movies.overview[1:300])

r = create_embeddings(
    ENV["OPENAI_API_KEY"],
    input
)
1 Like

Sorry for the delayed answer! I just found the time to test it with 1000 texts…And it works! Thanks a lot!

PS: It might be beneficial to include a note in the package documentation about the importance of filtering out empty strings.

1 Like