@svilupp and I were talking on the Slack about what a DSL might look like for generative models. I wanted to stick this on the forum for a more sticky conversational place, so I hope this can serve as kind of an evolving document.
Copied at the end is the original slack conversation.
The gist basically is that @svilupp is tinkering with tools for writing mini LLM programs, and I have also been intermittently tinkering with the same. @svilupp is adding stuff to PromptingTools.jl that processes an arbitrary program determined by a series of language model calls.
Here’s his example:
@aimodel function my_model(n=2; model="gpt3t")
# add a soft check for our AI task
# syntax: @aisuggest CODEBLOCK CONDITION FEEDBACK
# or simply @aisuggest CONDITION FEEDBACK if already INSIDE a CODEBLOCK
@aisuggest begin
# airetry will simply re-run if the call fails
@airetry greeting = ai("Say hi $(n)-times"; model)
# Nested @aiassert - hard check
count_hi = length(greeting)-length(replace(greeting,"hi"=>"")) == n
@aiassert count_hi "There must be exactly $(x) 'hi' in the greeting"
greeting_check = occursin("John", z)
end greeting_check "Greeting must include the name John"
return greeting
end
This program would attempt to re-force a language model to run until conditions are met. In this case, that’d be the language model saying “hi” five times. I love this framework and wanted to contribute a generalization of the code above, to think about what an abstract spec of this might mlook like.
I could imagine more powerful programs like this one, where this program is being run on a robot assistant named JerryBot.
User: My mom left her keys, wallet, and glasses on the table at McDonalds. Could you run back and get them?
Now, JerryBot has stuff to do. It has to follow the flow starting from the input from this user, which is a request to pick some stuff up from a table at McDonalds.
JerryBot has to figure out
- What did I just get? Is this a request, a statement, or other? If it’s a statement, save it to memory for later. We’ll ignore “other” for now, but in principle you can add separate control flows for other.
- If it’s a request, you can enter a separate control block. In this case, you need to know some more things? Here’s a small list of a few you might consider:
- Who asked? I may only respond to certain people.
- What kind of request is this? Retrieval, shut down, other?
- If it’s a retrieval:
- What do I have to get? Extract a list if multiple.
- Where is it?
- Do I need to know anything else?
and more. The idea here is that you can try to make weird programmatic flow by contextualizing, text extracting, etc. until you have some kind of result, for any arbitrary query.
As a practical example, I could imagine creating an auto-documenter that goes through every person’s Julia source code, and follows a series of steps to iteratively refine the documentation?
- Does this repo have Documenter set up? If no, do so. Otherwise, proceed.
- What does this package do? Please provide a list of steps to generate documentation for the package. This would be something like write a home page, add the API spec, manuals for X use cases, etc.
- For each step, recursively generate sub tasks. Each task is run until the model decides it knows the answer directly, and where no further subtasks are generated.
- Validate each step. Check your work – did you accomplish the goal? If not, please redo.
- If there is any code you have written, please execute it to ensure that it works. This would basically entail giving your LLM a REPL to use, and it can look at the callstacks and maybe automatically determine what to fix.
- Save the code, prepare a pull request.
I think that this type of program is extremely powerful, and I think that Julia is an extraordinary tool for working with this kind of thing. We’re good at DSLs, and I think you can make some absolutely gorgeous programs with language models when you give them a focused application and guidance.
I’m starting shitty tinkering with at colang, and @svilupp is kind of approaching it in his delightful way. I think my idea is to try to define some kind of type system for language. For example, I should be able to perform statements like
- Is this true or false?
- Would you say yes or no, assuming you are [insert perspective]?
- Does this seem to be about Z?
- Is this a list? For this one, you can also imagine a nested call that extracts the items in a list using structured text extraction.
and received a wrapper type around the expected response type. Once you have that response type that contains the raw response (“yes”) or a reduction of that response (true
). There’s maybe a few others (categorical selection, code evaluation, etc.).
I’m very interested in working on this, so I hope I can see some perspectives from people here.
Slack Log
Jan Siml 23 hours ago
Over the weekend, I’ve been playing with writing a compiler/DSL for writing mini LLM programs. The purpose is two-fold being more declarative and being less verbose. The hope is that once we know what user wants to do, we could help him optimize the function (eg, the prompt, call parameters).I discovered too late that @goto/label doesn’t work how I wanted, so I’ll need to rewrite bunch of stuff. I figured it’s a good opportunity to ask if I’m just wasting my time or going in the wrong direction.
Jan Siml 23 hours ago
Would you be able to use something like this?
@aimodel function my_model(n=2; model="gpt3t") # add a soft check for our AI task # syntax: @aisuggest CODEBLOCK CONDITION FEEDBACK # or simply @aisuggest CONDITION FEEDBACK if already INSIDE a CODEBLOCK @aisuggest begin # airetry will simply re-run if the call fails @airetry greeting = ai("Say hi $(n)-times"; model) # Nested @aiassert - hard check count_hi = length(greeting)-length(replace(greeting,"hi"=>"")) == n @aiassert count_hi "There must be exactly $(x) 'hi' in the greeting" greeting_check = occursin("John", z) end greeting_check "Greeting must include the name John" return greeting end
It would effectively rewrite into proper function and add the necessary boiler plate.Motivation and explanation of the syntax here: https://github.com/svilupp/PromptingTools.jl/blob/add-compiler-macro/src/Experimental/AgentTools/DSL_README.mdEDIT: There is no code to test yet - It doesn’t work yet and I’m too embarrassed about it’s current state (edited)
Jan Siml 23 hours ago
I’m keen to learn:
- would it be broadly useful (thinking about agentic workflows/automations)
- is something too hard to understand? what would simplify it? (within the limitations of what’s reasonably doable with macros)
SixZero 21 hours ago
I am interested, as I think many of us. I see some ideas written where it would be useful, while I would still want to know examples when these happen. Also figuring out intuitive ways to use it is I think somewhat important (edited)
Jan Siml 20 hours ago
Have you seen: GitHub - stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models ? It’s very much based on that
SixZero 20 hours ago
Yeah, opened it, although couldn’t really read it over, but looked promising.
Jan Siml 19 hours ago
I’d say it depends on what you’re using GenAI for…
- If you don’t need chaining a few AI calls together (eg, extract something, use it in the next call as inputs), you don’t need it… or if you call OpenAI 100x at once, chances are that one of the calls will fail, so how you handle that
- if you don’t mind writing prompts and tweaking them “manually”, the optimization is useless for you (and hence using the DSL)
Maybe it’s less “broadly” useful than I thought, which means I can focus on other projects!
That’s also a super valuable feedback
Cameron Pfiffer 12 hours ago
Oh I love this!
Cameron Pfiffer 12 hours ago
I have some stuff in colang for structured text in control flow, like yes/no, contains x, etc. Would be lovely to have that as well
SixZero 11 hours ago
No, I actually thibk these issues are super reasonable and anyone’s issues, so it is worth definitely @svilup
SixZero 11 hours ago
I just wante dto know how our life would be easier with it… eventually I want thing to work in an agent centric manner, to improve its solution step by step…
SixZero 11 hours ago
Just not quite right there tho at this point
Jan Siml 11 hours ago
I want thing to work in an agent centric
Can you say more? Do you have a specific example of how you run something today vs what would make things simpler?With “agents”, I struggle to find anything super useful besides a few chained AI calls with some validation (I think validation / self-retry is actually the most useful bit of the DSL above).What’s your take?
SixZero 11 hours ago
Actually still I don’t have a specific example, more of a dream, I am thinking about how things would be useful…
Also there are some somewhat good solutions to this ex. gpt-pilot…I guess one of the value of this agent thing:
- It has a lot of time to work, it can work the whole day.
- It needs to be “applyable” easily to whole projects, and work on them.
- Accuracy of its suggestion must be improved, making the code worse is just not fun. Probably the problem here is that, we also don’t really know what is a better code and what is not.
gpt-pilot is not good in easy applying in my opinion.
3rd point might be improvable if the system could come up with 3-4 different solutions on the next day, for solving things.Also what I see in copilot is that, it gives a pretty nice git diff on the code changes… this is somewhat a good format to look at as a human, to decide if something got improved or not… showing multiple solutions in this manner is I don’t know how hard…
1
SixZero 11 hours ago
Copilot excels at the 2nd point !
SixZero 11 hours ago
Accuracy could be improved… also it could used the whole day for improving things…
Saved for later • Due 7 hours ago
Jan Siml 11 hours ago
I have some stuff in colang for structured text in control flow, like yes/no, contains x, etc. Would be lovely to have that as well
@Cameron
Can you expand on that? I’d love to see a mock of what more complicated control flow looks like!Re. structured extraction, the idea here is that aiextract
is verbose, so you can write, eg, z = ai(…)::MyFancyType
→ that makes it obvious that compiler should use aiextract. Plus the z
will be already your MyFancyType instance (no need to call AIMessage().content
accessor)For yes/no, we should probably leverage aiclassify()
which leverages the logit_bias trick to answer in one token - either true/false (you could tweak to be yes/no)… We coudl add to compiler ai(…)::Bool
→ aiclassify
For contains x
AI calls, we could have a rule that if @aiassert/@aisuggest
are missing a condition, the “feedback” text becomes something to pass to LLM judge → aiclassify()
call that will answer true/false (with low temperature)
@aiassert begin…my code…end "<statement>"
→ @aiassert begin…mycode…end aiclassify("statement") "<statement> is not true"
For string-based occursin
conditions, it would be simply @aiassert CODE_BLOCK occursin(…) “feedback”)In general, what would be a killer feature/use cases that would make it worth it to switch to this syntax? Personally, I can write all these control flows very quickly, so it’s hard for me to see if the cost of learning a new syntax (eg, everything is a
ai()call and the
@ai…` macros) is worth it for users (edited)
1
Cameron Pfiffer 10 hours ago
Okay I have some thoughts here but will type em up later. Wonder if this is discourse able?