Best Practices for Using Coding Agents with Julia?

Very easy, just like the title says:

What are the best practices for letting coding agents start and manage Julia sessions for testing and debugging, while keeping the workflow safe, reproducible, and easy to reason about?

I think many people could find this useful, so maybe we can share our experiences, tips, and tricks in the comments below.

Julia depends very heavily on the REPL, so you want your agent to have access to a REPL.

For me, that means spinning up a REPL in a separate tmux/zellij session. Then the agents can just use send-keys to access the REPL and get fast feedback.

I currently use the Claude Code plugin (in a non edit mode) in Codium. I’m at the experimental stage so for now it’s more of a “how can I optimized this code”, “can you find bugs in this file” type use.

I have also experimented with a purely local setup using OpenCode linked to Qwen 3.6 or GPT-OSS but have found the models inferior.

My use is rather simple, so I’m also interested in what others are doing.

Has anyone been able to set up Opencode to use a running Julia REPL? Out of the box it’s restarting Julia all the time, making trivial mistakes with setting up environments etc.

Use an MCP server like kaimon.jl. You can have a persistent session with running state, which eliminates the startup cost of CLI julia.

I did a test with copilot CLI, to use Julia to create a DataFrame df with one column data with one row 1. I had to ask it to use kaimon otherwise it would spin up its own julia shell. But it connected to the MCP server, which you can monitor for kaimon TUI. I’m just getting started, but kaimon looks quite capable, and the agent knows quite a lot. You can use the ex MCP command, or just ask it to do stuff in natural language.

BTW copilot can also create a persistent julia session in a shell and refer back to it. Not sure what the limitations are compared to MCP.

Haven’t tried opencode but expect it would be similar.

I use a skill (a markdown file that gives the agent instructions) that I had Claude write, which tells the agent how to find an existing tmux session, start one if it doesn’t exist, start/restart the REPL, and send code to the REPL.

I have started bundling package-maintenance tasks into skills. I would rate these as “maturing” rather than “mature,” as I am still regularly finding ways to improve the skills to ensure higher-quality results. I’ve also made some effort to improve context-efficiency by being specific about certain instructions (“use a subagent to …” or “don’t read the source, extract this from Base.Docs.meta(MyPackage) using the following script:”).

Repo: GitHub - timholy/claude_config: Configuration files for claude code · GitHub

Would you share please?

I just verified that asking along the lines of “start a tmux session. write to it using tmux send-keys and see response using tmux capture-pane”, works. But it’s not a genuinely interactive session and furthermore it seems inefficient and brittle as capture-pane always captures the entire screen, and no more. And then the LLM would have to use its context window to diff between two capture-pane “states”. This looks harder than it should.

I just uploaded the skill & scripts here: GitHub - Satvik/julia-repl-skill · GitHub

But I would say the specific skill is less important than the process. What you really want to do is have Claude write the skill, try it out, and then suggest improvements when you see Claude struggling. For example, in the first version I saw it had a really hard time figuring out when the REPL was done processing, so I ended up having it write a separate python script wait-julia.

When the cost of verification is an edit (hot patch w/ Revise) + single tool call to evaluate a function / expression in a REPL, these agents get crazy good at writing Julia. It’s the same principle in software engineering where a mistake caught early is cheap compared to one caught late. There is a huge difference between having an agent write some code and testing it when its done and having the agent verify things at each step. When I switched to this style of workflow, suddenly the Julia code I got back that was supposed to be non-allocating actually was, not to mention it was significantly more likely to one-shot the task.

I second that. I’ve been experimenting with Claude Code, and had it write small programs entirely on its own. The programs did the job and did it well. Burned through my 5 hour session quota rather fast though (Opus 4.7).

How does everyone use AI to produce notebook-like reports (with code + plots etc.)? Ideally it should be compatible with long-running Julia sessions (like julia-mcp, which works well) to avoid paying JIT costs over and over. Jupyter and Pluto have their own idiosyncrasies, and it looks messy to ask an agent to create a Pluto file ex nihilo. I’ve been trying with Literate.jl, but it relies on Quarto to produce HTML files (QuartoRunner hasn’t worked out for me on Windows). That’s a fresh Julia process, so each iteration/tweak requires paying the JIT startup cost.

I have a little side effort I’ve been playing with which might tickle some of these desires. Stay tuned and I can share something soon? It builds this type of functionality on top of Kaimon.jl and integrates with any agentic AI that supports MCP.

I also second that. 6 months ago these tools produced code that called non existing functions and packages. Not even always syntactically correct. But I’m now a Claude Max client and I’m speechless. Don’t get me wrong, I’m still an expert in my field and a decent programmer, so I can guide the tool well. But it’s an amazing tool.

I’ve been using Claude Code a lot, and it hadn’t crossed my mind that I could speed things up like that. I guess on a modern CPU it’s not that bad, and the time spent “thinking” is an order of magnitude higher than the tool calling. Not to mention my time reviewing what CC is doing. Note: I mention CC because I have a (very good) experience with it, but its main competitor is supposed to be just as good.

Have you tried it with Kaimon.jl yet?

Currently, it seems like Claude Code can’t really execute individual cells – it has to rerun the whole notebook. So I have it use the REPL to do all the initial work, define and test the functions etc., and then have it make a jupyter notebook and execute it at the end.

Not yet

CC will take hints from the linter in VSCode though, so I find it doesn’t actually run the julia binary that often.

Thirded. The fact that the agent can inspect the LLVM/ASM output alone is clutch. Think local Compiler Explorer.