What are the best practices for letting coding agents start and manage Julia sessions for testing and debugging, while keeping the workflow safe, reproducible, and easy to reason about?
I think many people could find this useful, so maybe we can share our experiences, tips, and tricks in the comments below.
Julia depends very heavily on the REPL, so you want your agent to have access to a REPL.
For me, that means spinning up a REPL in a separate tmux/zellij session. Then the agents can just use send-keys to access the REPL and get fast feedback.
I currently use the Claude Code plugin (in a non edit mode) in Codium. I’m at the experimental stage so for now it’s more of a “how can I optimized this code”, “can you find bugs in this file” type use.
I have also experimented with a purely local setup using OpenCode linked to Qwen 3.6 or GPT-OSS but have found the models inferior.
My use is rather simple, so I’m also interested in what others are doing.
Has anyone been able to set up Opencode to use a running Julia REPL? Out of the box it’s restarting Julia all the time, making trivial mistakes with setting up environments etc.
Use an MCP server like kaimon.jl. You can have a persistent session with running state, which eliminates the startup cost of CLI julia.
I did a test with copilot CLI, to use Julia to create a DataFrame df with one column data with one row 1. I had to ask it to use kaimon otherwise it would spin up its own julia shell. But it connected to the MCP server, which you can monitor for kaimon TUI. I’m just getting started, but kaimon looks quite capable, and the agent knows quite a lot. You can use the ex MCP command, or just ask it to do stuff in natural language.
BTW copilot can also create a persistent julia session in a shell and refer back to it. Not sure what the limitations are compared to MCP.
Haven’t tried opencode but expect it would be similar.
I use a skill (a markdown file that gives the agent instructions) that I had Claude write, which tells the agent how to find an existing tmux session, start one if it doesn’t exist, start/restart the REPL, and send code to the REPL.
I have started bundling package-maintenance tasks into skills. I would rate these as “maturing” rather than “mature,” as I am still regularly finding ways to improve the skills to ensure higher-quality results. I’ve also made some effort to improve context-efficiency by being specific about certain instructions (“use a subagent to …” or “don’t read the source, extract this from Base.Docs.meta(MyPackage) using the following script:”).
I just verified that asking along the lines of “start a tmux session. write to it using tmux send-keys and see response using tmux capture-pane”, works. But it’s not a genuinely interactive session and furthermore it seems inefficient and brittle as capture-pane always captures the entire screen, and no more. And then the LLM would have to use its context window to diff between two capture-pane “states”. This looks harder than it should.