[ANN] SemanticCaches.jl: Save Time and Money with Request Caching

Introducing SemanticCaches.jl: Save Time and Money with Request Caching

Are you tired of repeated LLM API calls slowing down your AI application and eating into your budget? Look no further than SemanticCaches.jl, a caching package designed specifically for LLM workloads.

By caching expensive and slow LLM API calls (can be up to 20 seconds!!), this package helps reduce the time and money spent on repeated requests. With its semantic similarity-based caching system, SemanticCaches.jl ensures that you get the most out of your resources.

How it Works

SemanticCaches.jl offers two types of caching: exact matching (HashCache) and semantic similarity lookup (SemanticCache). The package uses a tiny BERT model to provide fast local embeddings on a CPU, making it suitable for applications with smaller volumes of requests. With the semantic similarity for smaller inputs, you can improve your cache hit rate (typos, small reformulation, etc).

Key Features

  • Reduces costs by caching expensive API calls
  • Ideal for demos, small user applications, and evals with repeated requests
  • Suitable for AI models with smaller volumes of requests (<10k per session or machine)
  • Supports caching HTTP requests with PromptingTools.jl

Get Started

To install SemanticCaches.jl, simply add the package using the Julia package manager:

using Pkg;

Check out the Quick Start Guide to get started with SemanticCaches.jl today!

Try it Today
Try SemanticCaches.jl today and start saving time and money on your GenAI application development!