[ANN]: Three small single-provider LLM API clients: AnthropicClient.jl, GroqClient.jl, GoogleLLMClient.jl

I’m happy to announce three small Julia packages for calling hosted LLM APIs — one each for Anthropic, Groq, and Google’s Gemini.

They’re deliberately minimal and single-provider by design: thin HTTP clients sharing one predictable surface, built for long-running batch and pipeline workloads where prompt caching, rate limiting, and cost accounting are what actually matter — rather than interactive prompting or multi-provider abstraction.

The packages:

  • AnthropicClient.jl — client for Anthropic’s Messages API, with prompt caching via cache_control markers.
  • GroqClient.jl — client for Groq’s OpenAI-compatible Chat Completions API, with strict JSON-schema structured output.
  • GoogleLLMClient.jl — client for Google’s Gemini Developer API (the generativelanguage endpoint, not Vertex AI), with response_schema structured output.

All three share the same design: chat / chat_async with keep-alive pooling, a per-client sliding-window RPM semaphore, per-reply token + USD cost accounting with a Budget wrapper that throws on cap, retry-after-aware 429 handling with bounded 5xx backoff, pure (network-free, key-free) body-building and reply-parsing for testing, and they never print your API key.

Other existing established packages: For a fuller, multi-provider experience — prompt templating, many backends including local models, RAG, conversation management — I think that the established package is PromptingTools.jl; there’s also OpenAI.jl for the OpenAI API and Ollama-based options for local models.

In contrast, my new three are a more bare-bones, one-API-each alternative when you mainly need a robust client with caching, rate limits, and cost accounting for production batch work.

Feedback and contributions welcome!

1 Like