I’m happy to announce three small Julia packages for calling hosted LLM APIs — one each for Anthropic, Groq, and Google’s Gemini.
They’re deliberately minimal and single-provider by design: thin HTTP clients sharing one predictable surface, built for long-running batch and pipeline workloads where prompt caching, rate limiting, and cost accounting are what actually matter — rather than interactive prompting or multi-provider abstraction.
The packages:
- AnthropicClient.jl — client for Anthropic’s Messages API, with prompt caching via
cache_controlmarkers. - GroqClient.jl — client for Groq’s OpenAI-compatible Chat Completions API, with strict JSON-schema structured output.
- GoogleLLMClient.jl — client for Google’s Gemini Developer API (the
generativelanguageendpoint, not Vertex AI), withresponse_schemastructured output.
All three share the same design: chat / chat_async with keep-alive pooling, a per-client sliding-window RPM semaphore, per-reply token + USD cost accounting with a Budget wrapper that throws on cap, retry-after-aware 429 handling with bounded 5xx backoff, pure (network-free, key-free) body-building and reply-parsing for testing, and they never print your API key.
Other existing established packages: For a fuller, multi-provider experience — prompt templating, many backends including local models, RAG, conversation management — I think that the established package is PromptingTools.jl; there’s also OpenAI.jl for the OpenAI API and Ollama-based options for local models.
In contrast, my new three are a more bare-bones, one-API-each alternative when you mainly need a robust client with caching, rate limits, and cost accounting for production batch work.
Feedback and contributions welcome!