RAGTools (from PromptingTools.jl) – Seeking Your Feedback on Next Steps

svilupp · January 21, 2025, 9:52am

Introduction & Audience

Hello everyone! I’d like to open this thread to gather real-world feedback on the RAGTools experimental submodule currently included in PromptingTools.jl. Specifically, this is for people who have actually used or experimented with RAGTools in their workflows. If you have experience integrating, customizing, or maintaining RAGTools in your projects, we’d love to hear from you. (Of course, if you’re simply curious, you’re more than welcome to chime in, too!)

What Are We Looking For?

Your Use Cases: How are you using RAGTools now? Which features do you rely on the most?
Pain Points: Anything that feels cumbersome or overly complex?
Feature Requests: Anything you desperately need that’s missing? Or changes that would smooth out your workflow?
Stability Concerns: Let us know if there’s something we shouldn’t break to avoid major disruption in your projects.

Ultimately, the plan is to separate RAGTools into a separate package under the JuliaGenAI organization for better governance and continuity. Before we make that transition, we want to hear what matters most to you. Also, if you have any specific views on how this transition should happen, please share!

Background

RAGTools has been part of PromptingTools.jl for some time. While it has remained stable, it’s grown into an area that many people rely on for retrieval-augmented generation (RAG) workflows. We’d like to:

Carve it out into a dedicated package so it can evolve more rapidly and independently.
Simplify dependencies (/extensions) in PromptingTools.jl by reducing the extra overhead RAGTools introduces, but also removing the need to import tonne of packages to trigger the extensions explicitly
Simplify how to configure and customize RAGTools for different RAG pipelines without making the codebase unwieldy. Also, ideally make it easier for everyone to build on top of it.

Since this library is already used in generative AI workflows, we want to ensure the transition and further development reflect the community’s needs.

Proposed Changes & My Wish List

1) Separate Repository

Why?
- Reduce the size and complexity of PromptingTools.jl.
- Allow RAGTools to move at its own pace without being tied to PromptingTools releases.
- Make dependency management clearer: one place to import for RAG features rather than juggling multiple extensions.
What We Need from You
- Feedback on how you currently import and use RAGTools. Would changing your import path be a big hassle?

2) Easier Configuration

The Challenge
- Right now, there’s a lot of dispatching around types and a tangle of keyword arguments that are function-specific (align to each stage/step of the RAG pipeline).
- Complex RAG pipelines may have multiple nested steps (often 2+ layers), which leads to confusion about where or how to specify parameters (e.g., embedding model vs. chat model) and nested named tuples are really inconvenient
Possible Approaches
1. Move Kwargs to Dispatch Types
  - Each dispatch type in RAGTools would define its own specialized fields, making it clearer what arguments exist.
  - Potentially less “magic,” but more explicit and possibly easier to read. It creates a lot of duplication though (Eg, verbosity kwarg…)
2. Nested Context Libraries (like EasyContext)
  - Provide a neat interface for hierarchical configuration.
  - Possibly simpler for end users to set everything up in one place.
Trade-offs
- Neither of the above solve the challenge of changing a “model” (depending on the pipeline, you might want to update 1 step or 4 steps, a model can be for embedding or planning or generation or modality translation/audio…, so a lot of ambiguity but also a lot of potential for overengineering…)
- Too much “magic” can hide what’s happening and make customizations harder.
- Too little abstraction means more lines of code for common tasks.
- We’d love to hear your thoughts on the balance between ease-of-use and flexibility.

3) Removing FlashRank dep

I have used FlashRank.jl many times for quick local ranking (when you fuse embedding- and keyword-based retrieval), but I appreciate most people might be calling APIs, so it might be a completely unnecessary dependency (it depends on ONNXRunTime.jl and a few things for tokenization).

Other Areas to Discuss

Code Organization & Cleanup: Are there parts of RAGTools that feel scattered or disorganized? I know Pablo V. has some thoughts there!
Context Handling: Does the current approach for feeding context into generative models work for you? @Sixzero will have some thoughts here
Feature Stability: Are there features or APIs you depend on that you need to remain backward compatible? Anything mention above that concerns you?

Moving Forward (Most likely)

The first step is to move RAGTools into its own repository under the JuliaGenAI organization once we have a clearer idea of the community’s needs. We envision a short transition period, after which PromptingTools.jl will no longer bundle RAGTools. Existing code will need a slight adjustment (likely just a new import path), but we aim to keep the core functionality stable.

We Want Your Feedback!

Have you tried the RAG approach in your own projects?
Tell us what you liked or disliked.
Any must-have features or improvements?
Don’t hold back—let’s make it the best tool possible. But please note that we are NOT looking to match all features of Python libraries
Worried about something changing?
Let us know so we can minimize disruption or plan a smooth deprecation path.

Thanks in advance for sharing your thoughts and experiences!

EDIT: If you’re not familiar with it, you can find a quick intro here

Sixzero · January 21, 2025, 12:20pm

I try to not answer too long here… but I want to answer everything.

Absolutely agree! DO the modularization! There is IMO huge need for that!
The way to accomplish this is already worth a talk! or maybe I think it is not clear to see what are the future limitations and advantages if we go with X or Y direction, currently I am also in the process to try to modularize and make things as “open” as I can.

My Usecases
In EasyContext the current way of calling embedders of PromptingTools is by wrapping them in another struct, so we can dispatch to another things before calling the wrapped struct.
Reason:

Batching is a little bit opinionated thing IMO, with the API ratelimit handling things. Handling ratelimits RPM and TPM(token per mins) properly is something worth noting.
Also some API have different max batchsizes, also max tokens per full batch (like voyager and it is quite low number) while the embedder can handle 32k tokens/embedding.
Also there is a need to cache all the embedding calls (for speed and cost reduction). This needs JUST one good solution… the current in the EasyContext is somewhat good, but there are so many edgecases we still need to think about… its still not finished.

EasyContext.jl is an experimental thing but I think many things could be already passed over to RAGTools… there are voyager and jina emberdders and some others also added. Also I think RerankGPT could have quite a few different versions… it is quite opinionated thing… the order in which it processes contexts… also there can be different ways the context batches are filled up, we could fill them up to make them all approximatelly contain the same amount of tokens or we can group things in the order context chunks came in (hoping for the fact that chunks which come one after the other belong together by this hoping fo the fact it will ease the selection for the reranker, also “dscode” and different models have different maxcontext limitations, for which it would be cool if we could dispatch - “Feature request 2” could help)

AISH.jl just uses things in these libraries… its basically workflows.

Separate Repository
Can ONLY agree!
Changing some imports are NOTHING compared to the win.

Easier Config
I would prefer this dispatch types/configs… can only agree with the trade-offs. I think my biggest fear with current kwargs approach is that, I set a kwargs which I set to a wrong API, where it didn’t error, and I think none in the world would ever realize you didn’t set temperature=0.1, you just think you did. This happened with me many times. Getting a runtime error is better then realizing in benchmarksafter spending 1-5$ that the that results are not really different because some settings didn’t went through.

Ideal dream world scenario

I would think the ideal scenario would be if everything would be extensible, and things were developed in a way that we can either do separate packages to extend the current CORE functionalities. This is where I think dispatch might help us, but who knows what would be the best solution.

Also ideal thing is if we can use simple Agents to use these tools/things to create other codes anyone needs. Getting things as natural sounding and as straight as we can in structure and in namings matter quite much for this!

Moving forward
I think a minor proposal to the modularization.
In EasyContext now there are 4 things (I call them things because I have no better word… I hope there will be… its just collection of tools/things, but I am not fan of tools word, because tools is a separate things in LLM context… actually I like these Things now xD after writing it down so many times xD I wonder what better idea we can come up to make agents work with these easier.).
AgentThings, ToolThings agents can use, ChatThings (like managing files in chats and cutting the history of chats, or summarizing too long chats), and there are the RAGThings context creation stuff, which I think is eventually a RAGTools thing, but I don’t mind where these codes are…

A minor question… agents essentially work in a chats like manner… how separate are ChatThings from AgentThings could be.

3) Removing FlashRank dep
Agree, I use BM25 keyword search exactly how it is in PromptingTools, I don’t use FlashRank IMO.

Other Areas to Discuss
Context Handling
I think EasyContext likes the way you developed many parts of the system, I would say we would need more Chunker ideas… Also maybe after this long being into this field… I realized Contexts shouldn’t be simplified into just strings… all I said before… we concat and cut strings to get new strings… this is all we do… it looks easy as we can expect everything is just a String right? But when you want to track history of something, where it came from… if you want to handle overlap of chunks… if you want to for example track a file how it changes and then update the content of a chunk… or if a chunk is a function definition in a file and for example the line number of the chunk changes because someone inserted a row above the function you want to track then these “just” strings concept start to need so much engineering to make this system work, things just become complicated…
Ofc, these only happen if you happen to track workspaces or package functionalities, but there are things like emails/websites/chats (all these could just stay strings, but websites also have a cache-policy and sometimes their content changes… chats… ah…) I might be overthinking it now a little bit, but IMO we need to think over all the usecases…

Feature request 1
Do the modularization and yeah I tried to mention as many things as I could!

Feature request 2
One other concern I recently realized, we cannot call a model optionally with as much parameter as we want so if I write an function and let users pass over the model as an argument, but later on someone would also want to pass over temperature or top_n or anything, then I need to change this function argument to also support passing temperature and top_n. I would prefer a way for model keyword to not only accept a model::String, but also model::ModelConfig, so api_kwargs would get moved into this.
Actually I think it is a less lightweight solution, since we use structs, and structs abstractions IMO are not that easily optimized by compilers, but I am starting to think it will be worth to sacrifice, for ease of use.
Passing over model configs should be a one argument thing in case we want to give the user the option to specify it, optional params should be in this OpenAIModelConfig or AnthropicModelConfig, also this might drop the need for OpenAIFlavor and many other flavors…, as the model type is already there.
also now if we define a new model, we cannot really specify the Flavor… things is burned in into a function… But I hope we can get rid of the Flavor thing. If the last aigenerate will use AbstractModelConfig instead of api_kwargs things will get a lot simpler IMO. We can wire everything in a way that ModelConfigs will get used. Although it is debatable whether api_kwargs or a struct should be the things from which we generate the REST headers and params for the REST call, I would vote for the struct option. I feel like letting users get a dispatch option just before getting the REST params would be really cool extensibility, we could fix things so easy and later on add the PR to the main repo.

Feature request 3
Sry, I just realised some has the same feature under: stop_sequence and stop and also some accept an vector of strings and other just accepts 1 string. So I wonder if we could make this thing universal. Also in return… some APIs return the whether they stopped on the stop_sequence or just return “done” in the stop field… xD this is… so not cool… but the one which returns “done” accepts 1 keyword, so there we know what was the stopping keyword. These should also be possible to be handled, which I think gets solved with the feature 2 request.

Feature request 4
Yeah another sry, I wonder if I see things well, but I think aigenerate has too many dispatch configurations… I think I have seen at least 2 well separated things in there, which I think should get separate names, the point here is I want to know a little bit better which one is getting called… I was not sure it was with the schema or the rendered thing… or where it was… the reason I am mentioning this because it is IMO a little bit hard to follow aigenerate, I hope we will get this simpler with the feature 2 request.

Also one minor things
in the user_preferences the list of model needs us to specify the model name twice, which I think is eliminateable.

Marcell_Havlik · January 22, 2025, 9:37am

Totally +1 for the refactorization and I also know it will make you realize even more simplification probably.

I would say break everything you want to achieve simplicity! EVERY single simplification or modularization at this point is a 100% win on long term. We will adapt to it IMO.

We used it in EasyContext.jl as you know. I think the RAG is a little bit more complex then it shuold. But not too much, it is not that bad! But at this point I would be open to your idea how you imagine the BEST and simplest flat version of that RAG design. Due to I think what you create is basically should be a very small and simple flat core, that can be extended very simply.
So for me I want to see:

Generation 1. of RAGTools that was in the PromptingTools.
And now break everything and make it as simple as possible, don’t add anything extra just the core and some algorithm that implements that core in the module. So we will have the Generation 2.

This RAGTools has to be the CORE of everyone’s process! It has to be simple and general and extendable with already ready examples.

Also one maybe very stupid own opinion. I love to have onliner breaks and returns, so julia looks actually extremly beautiful when you create:
!isempty(error) && return "..."
It further simplify the code and also can lead to extra optimization IMO.

Also having less token… we can work better with any package. So smaller is better in these days IMO.

Marcell_Havlik · January 22, 2025, 9:50am

For Feature 2:
There could be specific struct that describe the possible way to call every single model. (we could immediately see the supported parameters and what we should add)

If the struct exist for the called model then we could try to parse the kwargs into this struct and if it works then we validated the call.

Just an idea… I know this is extremly many hours of work, maybe not the priority. Should be done by an AI in 6 months?

Marcell_Havlik · January 22, 2025, 9:52am

Maybe by creating struct to everything we make the stuffs more rigid but in the same time the structures sort of defines protocols which clarify how things can and should be used. So maybe it worth the investment.

Sixzero · January 22, 2025, 5:53pm

I went back to the RAG pipelines, and realized that find_closest also does a sort, but I think the sorting is not its responsibility.

github.com/svilupp/PromptingTools.jl

src/Experimental/RAGTools/retrieval.jl

2600eb97f


      
          
          Finds the indices of chunks (represented by embeddings in `emb`) that are closest (in cosine similarity for `CosineSimilarity()`) to query embedding (`query_emb`). 
          
          `finder` is the logic used for the similarity search. Default is `CosineSimilarity`.
          
          If `minimum_similarity` is provided, only indices with similarity greater than or equal to it are returned. 
          Similarity can be between -1 and 1 (-1 = completely opposite, 1 = exactly the same).
          
          Returns only `top_k` closest indices.
          """
          function find_closest(
                  finder::CosineSimilarity, emb::AbstractMatrix{<:Real},
                  query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
                  top_k::Int = 100, minimum_similarity::AbstractFloat = -1.0, kwargs...)
              # emb is an embedding matrix where the first dimension is the embedding dimension
              scores = query_emb' * emb |> vec
              top_k_min = min(top_k, length(scores))
              ## Take the top_k largest because larger is better in Cosine similarity (=1 is the best)
              positions = partialsortperm(scores, 1:top_k_min, rev = true)
              if minimum_similarity > -1.0
                  mask = @view(scores[positions]) .>= minimum_similarity

I think when there is multiple embedding strategies, then we would only need the score and the user should decide whether he wants to average or weight or take the max of each document’s score, to decide which documents are more relevant.

So we should have mutliple different scoring methods. I hoep this can get into the next iteration!

pabvald · January 24, 2025, 12:15pm

Code Organisation and Cleanup

Code Organisation & Cleanup: Are there parts of RAGTools that feel scattered or disorganised? I know Pablo V. has some thoughts on this!

As @svilupp mentioned, I have some comments on code organisation and cleanup. When I tried to make my first contribution to RAGTools (within PromptingTools), I struggled to find the different parts of the logic that I needed to modify due to the size of some files and the lack of organisation within those files. In my opinion, files like types.jl or retrieval.jl should be made into a folder (not a module, just a folder), separating the different concepts into different files. For example, from types.jl I would create the following folder

types/
   types.jl 
   chunks.jl 
   document_term_matrices.jl
   indexes.jl
   rag_results.jl

From my point of view, splitting the types.jl file seems natural when you have such clearly differentiated concepts. Logic that may involve different types as well as the definition of constants can simply be included in types.jl. This makes it easier to find the logic you need, and easier to add new logic in the right place, without making it more difficult for the next contribution or extension.

The main arguments against this so far are

When using coding agents, larger files make it more likely / easier to find the right piece of code (although it costs more tokens).

I haven’t experienced this problem myself, as I usually compose up to 4 files to make a query to the coding agent.
When it comes to finding code for newbies, they should use search - it’s super fast, I often use it even when I know exactly where some code is.

I don’t think using search is against my proposal. Search should always be the fastest way to find a particular name or signature in the codebase, and I don’t think a file should be split just because it’s long, but only in cases where there are different types or concepts. For example, splitting files like types.jl and retrieval.jl would help to have a structure of the existing types and go directly to the relevant type for the contribution.

I am aware that the focus of the discussion is on features/functionality and configuration, and that this may seem irrelevant to some of you. However, I believe that making the code and structure of the package easy to understand is key to keeping the package scalable and making it easier for future contributors. I also think it is very important if many developers are going to have to dive into the source code to make the necessary customisations for their applications.

P.S. This is my very first post Happy to actively contribute to the Julia language.

svilupp · January 28, 2025, 9:51am

Thank you, everyone!

It seems that we have consensus on separating the package, so the first goal will be to ensure a smooth transition (likely with no change to avoid any confusion between versions and packages).

The placeholder repo is here for now: GitHub - JuliaGenAI/RAGTools.jl: All-in-one RAG toolkit—from quick prototypes to advanced pipelines.
It’s only a wrapper (importing the PT modules for now) and it doesn’t have the tests, docs, etc. But just putting it on your radar.

As for the feature requests, they all make sense. I think the highest quality of life improvement would be the unified ModelConfig to bundle model name and API kwargs.

Do you have any views on the design? Please do not paste AI slop here (massive code blocks), just design proposals.
My first intuition is that it should live in PromptingTools and I keep going back&forth whether it should have an associated PromptSchema – I believe it should, because without the Schema (eg, OpenAI API) you don’t know how to format the request and where to send it.

The associated choice could be to allow PromptSchemas to have URL, API key, etc. override to not have to define custom methods for each API providers and just define what we need in the Schema spec. Any views on that?

svilupp · February 9, 2025, 11:26am

FYI.

I’ve finished the carve out, docs clean up + triggered the registration (status: New package: RAGTools v0.1.0 by JuliaRegistrator · Pull Request #124608 · JuliaRegistries/General · GitHub).

Repo link: GitHub - JuliaGenAI/RAGTools.jl: All-in-one RAG toolkit—from quick prototypes to advanced pipelines.

Timeline:

PromptingTools is on version 0.72 now
the goal is to remove all RAG-focused functionality (RAGTools sub-module) by version 0.75!

EDIT: It is now registered and can be added directly! Ie, no more adding of 5 packages to trigger the extensions

Topic		Replies	Views
[ANN] PromptingTools.jl - Your Daily Dose of AI Efficiency! Package Announcements announcement , productivity , generative-ai , prompting	7	1773	April 11, 2024
A Julia DSL for language models Machine Learning generative-ai	14	1827	February 28, 2024
[ANN] Spehulak.jl: Spy on your GenAI models Package Announcements genai , observability	0	122	July 8, 2024
Julia response to Auto-GPT Machine Learning	2	1044	May 6, 2023
What's your AI programming tool stack? Tooling generative-ai , cursor , copilot	13	2965	March 26, 2025