I tried on an old i7 running Ubuntu server 24.04 and something went off the rails, which seems to trace to Claude, which was devouring 32GB of RAM and another 72G of swap. Needless to say, the box seized up. Since an MCP is not strictly necessary for the reasons I have the server, I’m trying to back out to get back to just Ollama. Not at all a slam on Kaimon, just context for anyone else experiencing the same with Claude.
Claude code is a rapidly evolving piece of software itself, and I’ve seen a few cases where over long periods the claude executable seemed to be leaking memory. It hasn’t been reproducible, but just something to be aware of. If you’re using Ollama, I’m curious what model(s) you might be using locally? Do you integrate that with something that is behaving as the ‘agent’?
Kaimon uses Ollama and an embedding model (currently qwen-embedding:0.6b by default) for semantic indexing/search. The model is configurable.
Claude code is only one of the many agentic AI coding assistants available, I wonder if you might try it again with another one?
Not sure if it is the same issue, but claude code can be buggy like this if the tool results are large. As a workaround I limit output in my mcp server to less than 100 KB SandboxMCPRepl.jl/test/runtests.jl at f98804309897a7b7c5cdf0eacd2bd9a679fa4a95 · nhz2/SandboxMCPRepl.jl · GitHub
Kaimon limits the tool output as well, for example in the main code execution tool, the max_output parameter can be modified by the agent to control the max output size and has a default of 6k characters.
I frequently see this in the claude log:
● kaimon - run_tests (MCP)(pattern: "ThreeWindingSpec", verbose: 2)
⎿ ERROR: FieldError: type Kaimon.TestRun has no field `reader_done`, available fields: `id`, `project_path`, `starte
d_at`, `finished_at`, `status`, `pattern`, `results`, `failures`, `raw_output`, `total_pass`, `total_fail`, `total
_error`, `total_tests`, `pid`, `process`
● kaimon - run_tests (MCP)(pattern: "ThreeWindingSpec")
⎿ ERROR: FieldError: type Kaimon.TestRun has no field `reader_done`, available fields: `id`, `project_path`, `starte
d_at`, `finished_at`, `status`, `pattern`, `results`, `failures`, `raw_output`, `total_pass`, `total_fail`, `total
_error`, `total_tests`, `pid`, `process`
● Bash(julia --project -e 'using Pkg; Pkg.test()' 2>&1)
⎿ Status `~/dev/projects/2026/transformer-thermal/TransformerThermalModel.jl/test/Project.toml`
[f8b46487] TestItemRunner v1.1.4
[1c621080] TestItems v1.0.0
… +67 lines (ctrl+o to expand)
⎿ (timeout 2m)
It seems there is some issue with kaimon - run_tests, where the agent gives up and runs the tests manually. Any idea what would cause this or how to resolve it?
I just tried out the 1.1 branch and it fixs the memory leak issue! This is 100% a claude code bug not a kaimon bug, but the workaround seems to be effective. If you want to reproduce here is prompt to try:
Prompt to help reproduce claude code memory leak
Reproduce memory leak from MCP server returning large output.
Steps:
- Fetch the
mcp__kaimon__manage_replandmcp__kaimon__extool schemas - Run
manage_replwith command “restart” - Run
mcp__kaimon__exwithe="'x'^100_000_000",q=false,max_output=100000000 - This will produce a
persisted-outputsaved to a JSON file. Use the Read tool on that file withoffset=4, limit=1to read line 4 (the 100M character line). - While reading the human presses the CTRL and O keys at the same time.
Results with 1.1 branch of kaimon
● kaimon - ex (MCP)(e: “‘x’^100_000_000”, q: false, max_output: 100000000)
⎿ "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
… +149 lines (ctrl+o to expand)
● The output was truncated to 25000 chars but mentions 100000002 total chars. Let me look for the persisted output file.
● Searched for 4 patterns (ctrl+o to expand)
● The MCP tool’s max_output parameter has a hard cap of 25,000 characters (as noted in the schema: “max: 25000”), so despite requesting 100M, the output was truncated to 25,000
chars. No persisted-output file was created.
The memory leak scenario as described can’t be fully reproduced because:
- The max_output parameter caps at 25,000 characters server-side
- The 100M character string was generated in Julia but only 25K chars were returned to the client
- No persisted-output JSON file was saved to disk
Would you like to try a different approach to reproduce the leak, or investigate whether there’s a way to bypass the output cap?
Try the 1.1 branch. Also it will be released probably today, but wouldn’t mind some early feedback.
I’ve made a number of improvements to the way Kaimon’s ZMQ messaging layer works. Not sure it that’s it but worth a try.
Also sometimes agents seem to want to revert back to using grep/shell because they must have a lot of reinforcement training on them. It seems to be necessary to reorient agents at using MCP stuff sometimes. Hopefully that there will be improvements to models over time which mitigate this, also there are things you can do like putting reminders/guidance in CLAUDE.md, etc.
Yeah I tried to force it to use MCP, but it just hits a wall and says
● The run_tests tool is hitting an internal kaimon server bug (FieldError: type Kaimon.TestRun has no field reader_done).
This appears to be a bug in the kaimon server itself — the TestRun struct was likely updated but the server code still
references an old field.
The tests themselves are passing fine — I confirmed this earlier via julia --project -e 'using Pkg; Pkg.test()' which
showed all 15 tests passing. The LSP diagnostics you see in the IDE are also false positives since it doesn't understand
the @testsnippet/@testitem macros from TestItemRunner.jl.
If you can reproduce with 1.1 would you file an issue? I believe it should already be fixed if it was a bug previously.