RAGTools (from PromptingTools.jl) – Seeking Your Feedback on Next Steps

Introduction & Audience

Hello everyone! I’d like to open this thread to gather real-world feedback on the RAGTools experimental submodule currently included in PromptingTools.jl. Specifically, this is for people who have actually used or experimented with RAGTools in their workflows. If you have experience integrating, customizing, or maintaining RAGTools in your projects, we’d love to hear from you. (Of course, if you’re simply curious, you’re more than welcome to chime in, too!)

What Are We Looking For?

  • Your Use Cases: How are you using RAGTools now? Which features do you rely on the most?
  • Pain Points: Anything that feels cumbersome or overly complex?
  • Feature Requests: Anything you desperately need that’s missing? Or changes that would smooth out your workflow?
  • Stability Concerns: Let us know if there’s something we shouldn’t break to avoid major disruption in your projects.

Ultimately, the plan is to separate RAGTools into a separate package under the JuliaGenAI organization for better governance and continuity. Before we make that transition, we want to hear what matters most to you. Also, if you have any specific views on how this transition should happen, please share!


Background

RAGTools has been part of PromptingTools.jl for some time. While it has remained stable, it’s grown into an area that many people rely on for retrieval-augmented generation (RAG) workflows. We’d like to:

  1. Carve it out into a dedicated package so it can evolve more rapidly and independently.
  2. Simplify dependencies (/extensions) in PromptingTools.jl by reducing the extra overhead RAGTools introduces, but also removing the need to import tonne of packages to trigger the extensions explicitly
  3. Simplify how to configure and customize RAGTools for different RAG pipelines without making the codebase unwieldy. Also, ideally make it easier for everyone to build on top of it.

Since this library is already used in generative AI workflows, we want to ensure the transition and further development reflect the community’s needs.


Proposed Changes & My Wish List

1) Separate Repository

  • Why?
    • Reduce the size and complexity of PromptingTools.jl.
    • Allow RAGTools to move at its own pace without being tied to PromptingTools releases.
    • Make dependency management clearer: one place to import for RAG features rather than juggling multiple extensions.
  • What We Need from You
    • Feedback on how you currently import and use RAGTools. Would changing your import path be a big hassle?

2) Easier Configuration

  • The Challenge
    • Right now, there’s a lot of dispatching around types and a tangle of keyword arguments that are function-specific (align to each stage/step of the RAG pipeline).
    • Complex RAG pipelines may have multiple nested steps (often 2+ layers), which leads to confusion about where or how to specify parameters (e.g., embedding model vs. chat model) and nested named tuples are really inconvenient
  • Possible Approaches
    1. Move Kwargs to Dispatch Types
      • Each dispatch type in RAGTools would define its own specialized fields, making it clearer what arguments exist.
      • Potentially less “magic,” but more explicit and possibly easier to read. It creates a lot of duplication though (Eg, verbosity kwarg…)
    2. Nested Context Libraries (like EasyContext)
      • Provide a neat interface for hierarchical configuration.
      • Possibly simpler for end users to set everything up in one place.
  • Trade-offs
    • Neither of the above solve the challenge of changing a “model” (depending on the pipeline, you might want to update 1 step or 4 steps, a model can be for embedding or planning or generation or modality translation/audio…, so a lot of ambiguity but also a lot of potential for overengineering…)
    • Too much “magic” can hide what’s happening and make customizations harder.
    • Too little abstraction means more lines of code for common tasks.
    • We’d love to hear your thoughts on the balance between ease-of-use and flexibility.

3) Removing FlashRank dep

  • I have used FlashRank.jl many times for quick local ranking (when you fuse embedding- and keyword-based retrieval), but I appreciate most people might be calling APIs, so it might be a completely unnecessary dependency (it depends on OnnxRuntime.jl and a few things for tokenization).

Other Areas to Discuss

  • Code Organization & Cleanup: Are there parts of RAGTools that feel scattered or disorganized? I know Pablo V. has some thoughts there!
  • Context Handling: Does the current approach for feeding context into generative models work for you? @Sixzero will have some thoughts here
  • Feature Stability: Are there features or APIs you depend on that you need to remain backward compatible? Anything mention above that concerns you?

Moving Forward (Most likely)

The first step is to move RAGTools into its own repository under the JuliaGenAI organization once we have a clearer idea of the community’s needs. We envision a short transition period, after which PromptingTools.jl will no longer bundle RAGTools. Existing code will need a slight adjustment (likely just a new import path), but we aim to keep the core functionality stable.


We Want Your Feedback!

  • Have you tried the RAG approach in your own projects?
    Tell us what you liked or disliked.
  • Any must-have features or improvements?
    Don’t hold back—let’s make it the best tool possible. But please note that we are NOT looking to match all features of Python libraries :slight_smile:
  • Worried about something changing?
    Let us know so we can minimize disruption or plan a smooth deprecation path.

Thanks in advance for sharing your thoughts and experiences!

EDIT: If you’re not familiar with it, you can find a quick intro here

5 Likes

I try to not answer too long here… but I want to answer everything.

Absolutely agree! DO the modularization! There is IMO huge need for that!
The way to accomplish this is already worth a talk! or maybe I think it is not clear to see what are the future limitations and advantages if we go with X or Y direction, currently I am also in the process to try to modularize and make things as “open” as I can.

My Usecases
In EasyContext the current way of calling embedders of PromptingTools is by wrapping them in another struct, so we can dispatch to another things before calling the wrapped struct.
Reason:

  • Batching is a little bit opinionated thing IMO, with the API ratelimit handling things. Handling ratelimits RPM and TPM(token per mins) properly is something worth noting.
  • Also some API have different max batchsizes, also max tokens per full batch (like voyager and it is quite low number) while the embedder can handle 32k tokens/embedding.
  • Also there is a need to cache all the embedding calls (for speed and cost reduction). This needs JUST one good solution… the current in the EasyContext is somewhat good, but there are so many edgecases we still need to think about… its still not finished.

EasyContext.jl is an experimental thing but I think many things could be already passed over to RAGTools… there are voyager and jina emberdders and some others also added. Also I think RerankGPT could have quite a few different versions… it is quite opinionated thing… the order in which it processes contexts… also there can be different ways the context batches are filled up, we could fill them up to make them all approximatelly contain the same amount of tokens or we can group things in the order context chunks came in (hoping for the fact that chunks which come one after the other belong together by this hoping fo the fact it will ease the selection for the reranker, also “dscode” and different models have different maxcontext limitations, for which it would be cool if we could dispatch - “Feature request 2” could help)

AISH.jl just uses things in these libraries… its basically workflows.

Separate Repository
Can ONLY agree!
Changing some imports are NOTHING compared to the win.

Easier Config
I would prefer this dispatch types/configs… can only agree with the trade-offs. I think my biggest fear with current kwargs approach is that, I set a kwargs which I set to a wrong API, where it didn’t error, and I think none in the world would ever realize you didn’t set temperature=0.1, you just think you did. This happened with me many times. Getting a runtime error is better then realizing in benchmarksafter spending 1-5$ that the that results are not really different because some settings didn’t went through.

Ideal dream world scenario

I would think the ideal scenario would be if everything would be extensible, and things were developed in a way that we can either do separate packages to extend the current CORE functionalities. This is where I think dispatch might help us, but who knows what would be the best solution.

Also ideal thing is if we can use simple Agents to use these tools/things to create other codes anyone needs. Getting things as natural sounding and as straight as we can in structure and in namings matter quite much for this!

Moving forward
I think a minor proposal to the modularization.
In EasyContext now there are 4 things (I call them things because I have no better word… I hope there will be… its just collection of tools/things, but I am not fan of tools word, because tools is a separate things in LLM context… actually I like these Things now xD after writing it down so many times xD I wonder what better idea we can come up to make agents work with these easier.).
AgentThings, ToolThings agents can use, ChatThings (like managing files in chats and cutting the history of chats, or summarizing too long chats), and there are the RAGThings context creation stuff, which I think is eventually a RAGTools thing, but I don’t mind where these codes are…

A minor question… agents essentially work in a chats like manner… how separate are ChatThings from AgentThings could be.

3) Removing FlashRank dep
Agree, I use BM25 keyword search exactly how it is in PromptingTools, I don’t use FlashRank IMO.

Other Areas to Discuss
Context Handling
I think EasyContext likes the way you developed many parts of the system, I would say we would need more Chunker ideas… Also maybe after this long being into this field… I realized Contexts shouldn’t be simplified into just strings… all I said before… we concat and cut strings to get new strings… this is all we do… it looks easy as we can expect everything is just a String right? But when you want to track history of something, where it came from… if you want to handle overlap of chunks… if you want to for example track a file how it changes and then update the content of a chunk… or if a chunk is a function definition in a file and for example the line number of the chunk changes because someone inserted a row above the function you want to track then these “just” strings concept start to need so much engineering to make this system work, things just become complicated…
Ofc, these only happen if you happen to track workspaces or package functionalities, but there are things like emails/websites/chats (all these could just stay strings, but websites also have a cache-policy and sometimes their content changes… chats… ah…) I might be overthinking it now a little bit, but IMO we need to think over all the usecases…

Feature request 1
Do the modularization and yeah I tried to mention as many things as I could! :smiley:

Feature request 2
One other concern I recently realized, we cannot call a model optionally with as much parameter as we want so if I write an function and let users pass over the model as an argument, but later on someone would also want to pass over temperature or top_n or anything, then I need to change this function argument to also support passing temperature and top_n. I would prefer a way for model keyword to not only accept a model::String, but also model::ModelConfig, so api_kwargs would get moved into this.
Actually I think it is a less lightweight solution, since we use structs, and structs abstractions IMO are not that easily optimized by compilers, but I am starting to think it will be worth to sacrifice, for ease of use.
Passing over model configs should be a one argument thing in case we want to give the user the option to specify it, optional params should be in this OpenAIModelConfig or AnthropicModelConfig, also this might drop the need for OpenAIFlavor and many other flavors…, as the model type is already there.
also now if we define a new model, we cannot really specify the Flavor… things is burned in into a function… But I hope we can get rid of the Flavor thing. If the last aigenerate will use AbstractModelConfig instead of api_kwargs things will get a lot simpler IMO. We can wire everything in a way that ModelConfigs will get used. Although it is debatable whether api_kwargs or a struct should be the things from which we generate the REST headers and params for the REST call, I would vote for the struct option. I feel like letting users get a dispatch option just before getting the REST params would be really cool extensibility, we could fix things so easy and later on add the PR to the main repo.

Feature request 3
Sry, I just realised some has the same feature under: stop_sequence and stop and also some accept an vector of strings and other just accepts 1 string. So I wonder if we could make this thing universal. Also in return… some APIs return the whether they stopped on the stop_sequence or just return “done” in the stop field… xD this is… so not cool… but the one which returns “done” accepts 1 keyword, so there we know what was the stopping keyword. These should also be possible to be handled, which I think gets solved with the feature 2 request.

Feature request 4
Yeah another sry, I wonder if I see things well, but I think aigenerate has too many dispatch configurations… I think I have seen at least 2 well separated things in there, which I think should get separate names, the point here is I want to know a little bit better which one is getting called… I was not sure it was with the schema or the rendered thing… or where it was… the reason I am mentioning this because it is IMO a little bit hard to follow aigenerate, I hope we will get this simpler with the feature 2 request. :slight_smile:

Also one minor things
in the user_preferences the list of model needs us to specify the model name twice, which I think is eliminateable.