[ANN] MCPRepl.jl -- share your REPL with your AI Agent

I am experimenting a bit more with coding agents lately, especially Claude Code. To circumvent the TTFX issues, I am trying to teach Claude to use Julia how I use it: through the REPL. For that, I’ve created a small package, MCPRepl.jl, whose purpose is to expose your REPL via the MCP protocol, i.e., coding agents can send code to your REPL: both you and the agend share the same state.

The results are… well, sometimes great, sometimes not. All work in progress. But I thought I’d share – might be of value to someone else!

:warning: Disclaimer: This tool opens a TCP port and executes arbitrary code which it receives. This is incredibly stupid and unthinkable outside the AI coding agent yolo world, so be warned!

The key problems I identified so far:

  • MCP connection may break. Claude assumes a lot of auth endpoints on the server; MCPRepl.jl tries to fake them.
  • Claude has a very bad concept of environments, modules, and so on. It will sometimes try to include stuff from the src folder or similar. I think better prompting would help there.

Contributions welcome! I don’t know how much time and effort I want to put into this experiment, but if more people contribute and we get the prompts right to improve Claude’s usage of that tool, it might become really helpful.

Similar Packages

  • ModelContexProtocol.jl offers a way of defining your own servers. Since MCPRepl is using a HTTP server I decieded to not go with this package.

  • REPLicant.jl is very similar, but the focus of MCPRepl.jl is to integrate with the user repl so you can see what your agent is doing.

11 Likes

Very interesting! I built a similar tool but decided to use a jupyter kernel, which has had the advantage of being pretty stable, but the disadvantage of not being very visible.

Interesting. Is that code published? Did you create an HTTP MCP server or did you create an server which is launched as an executable by the agent and communicates via stdin/stdout?

It’s not published yet…I’ll try to get it up on github today or tomorrow (need to strip out some company-specific details.)

I used Python and FastMCP, mostly because Python has the reference implementation of the jupyter_client library. Then I created an MCP file with FastMCP that’s just a thin wrapper around those functions, e.g.

@mcp.tool()
def julia_execute(code: str, timeout: int = 300) -> str:
    """Execute Julia code and return the output.
    
    Args:
        code: Julia code to execute
        timeout: Execution timeout in seconds (default: 300)
    """
...

and with Claude Code it’s just claude mcp add -s user julia-kernel python mcp/julia_mcp_server.py, using stdio by default.

1 Like

I am curious where I can check out the MCP server :slight_smile: It is crazy that, recently I also thought about that we should have something like this and posted that I want this tool :smiley: Slack

Actually this is even better, this can be started in any REPL. Just wow.

Hi @hexaeder and all,

I’ve been putting some work into my fork of this and have enhanced it significantly with a lot of VSCode related features for Copilot. Here is a partial list of additions:

Bidirectional VS Code Communication: Implemented two-way communication between the MCP server and VS Code, enabling the server to execute VS Code commands and receive responses via HTTP callbacks with request tracking

Real-time Streaming Output (SSE): Added Server-Sent Events streaming for incremental output display, allowing AI agents to see long-running command output in real-time instead of waiting for completion

Enhanced Debugging Workflow: Added comprehensive debugging tools including breakpoint management, step controls (into/over/out), watch expressions, variable inspection, and debug session management - all controllable via MCP tools

LSP Integration (5 new tools): Added Julia Language Server Protocol capabilities including goto_definition, find_references, hover_info, document_symbols, and workspace_symbols for code navigation and intelligence directly from the MCP server

Package Management Tools: Added pkg_add and pkg_rm convenience wrappers with better error handling, plus improved Pkg.status() integration for environment inspection

Dynamic VS Code Command Execution: Integrated VS Code Remote Control extension allowing MCP tools to trigger any allowlisted VS Code command (file operations, terminal control, tasks, git, etc.) with optional response handling

Code Quality Tools: Added format_code (JuliaFormatter.jl) and lint_package (Aqua.jl) tools for automated code formatting and quality assurance testing

Port Configuration Flexibility: Fixed hardcoded port issues - server now properly uses configurable ports with environment variable support

Comprehensive Documentation: Added detailed workflow guides (DEBUGGING_WORKFLOWS.md, LSP_INTEGRATION.md), bidirectional testing documentation, and enhanced AI agent prompt instructions

7 Likes

Looks great. Do you think that in the medium to long term, this might be the optimal approach and workflow for utilizing AI advice in coding? By the way, what’s your long-term goal, if you don’t mind me asking?

I’m not sure about ‘optimal’ but it is an interesting approach. Now that I’ve been playing around with it, I think that keeping the server running inside is a bit of a downside. This is because it’s somewhat common that one needs to relaunch Julia when big changes are happening to the code, even if Revise is used. There are still certain additions or restructurings that seem to leave the REPL in an outdated state.

In those cases, the AI agent needs to restart the REPL, which at the same time kills the MCP server since it is in the same process. I have added instructions so that the AI agent should know that this will take place and need to sleep or wait for a few seconds for it to come up. It’s not a huge issue but it seems a little annoying when it happens because it slows down the flow a bit, and sometimes the agent will give up and I have to prod it.

I’ve continued to play around with this and there’s significantly more functionality (and a little "fun"ctionality, at least during setup). I’ve worked a bit on security, to provide a mode that requires an API key in order for the agent to talk to the MCP server. With that addition I think it might be time to submit the package to the Julia repository and see what people make of it.

Another thing I’ve noticed is that while I’ve added lots of available tools, the agents I’ve been testing don’t always go to them. I’ve added in the LSP integration which should in theory allow the agent a much cleaner path to querying the code but unless I specifically prompt for its use, it seems to prefer simply parsing the file or searching with grep on the command line, instead of using the fancy tools.

I don’t know what my long term goal for this is, it’s been more of an experiment than anything. But I would like to see if others find it useful or interesting and see where that goes.

Did you try it out? I’d be interested to learn about your experience and success or failure.

1 Like

Thank you for the reply. I’m sorry, I haven’t tried it yet. I’m currently using Kiro, I’m in the middle of a project, and in general, I have limited resources allocated for these models. However, based on the information provided by @hexaeder and you, I get the feeling that it’s potentially really cool. So far, I’ve been using Cursor, Pearl, Windsurf, Trae, and Kiro. On the CLI side, I’ve been using Aider, Gemini, and Crush. I think there might be a sustained need for a solution like the one presented in this thread. It’s probably unlikely that large corporations will allow direct code execution on their servers as it might be too risky for them to implement at scale. However, I might be wrong.

Yes, recently, I was reading some forum posts about keeping Julia fully interactive when prototyping. My approach is mostly focused on chatting with these models and incrementally building the code by hand. I rarely use the “automatic builder mode” included in the IDEs I mentioned above. When I tried them, I found it much harder to understand the generated code, and the number of necessary corrections usually increased exponentially. I often ended up breaking my working codebase when using them, especially with Aider, Gemini, and Crush CLIs.

I think it would be great to register the package and give a broader audience the chance to test it. I’ll definitely do that after finishing my current project. My workflow is simple. The most important feature for me would be the agent’s ability to execute code and react immediately and automatically when something doesn’t work. Usually, I’m only interested in reviewing working suggestions, so my hope is to somehow filter out the noise. Another feature that’s important to me is keeping a safe working version of a reference file, editing a copy, and updating the reference file only when I’m 100% satisfied with the changes. This way, I can stay on top of what’s changing and produce a rather very high quality codebase. My hope is that I’ll be able to replicate this workflow with your potential package or learn a new one that is even more suitable for me.

@hexaeder @kahliburke I re-read my post and realized that I did not intend to sound so dry. I simply wanted to reiterate my interest in your package.

I still have the MCPRepl package loaded per default in my julia sessions and use it occasionally. In some situations, it really helped a lot. Especially in writing testcases: no the AI can actually try them out to see if problems arise.
I think overall the statefull appraoch of Julia might not fit the LLMs so well. It has lots of problems with environments and environment management especially. Often, when things fail, the AI tries to get creative with REPL usage, burning tokes without doing any usefull stuff.

Therefore, I hold the agent on quite a short leash. Having short tasks, review often within a session and essentially just offloading tedious coding tasks but close to none of the software design aspects. If thats the best way to use AI? Probably not. Sometimes I have fun and feel productive doing it. Sometimes I go for days without using it at all.

I think it would be great to register the package and give a broader audience the chance to test it.

Well, no need to register it as you can just do

julia>] add/dev https://github.com/hexaeder/MCPRepl.jl

to try it out. Since this is
a) mostly a vibe coded package and
b) an incredible security risk to give the Agent a tool to execute arbitrary code,
it is just incompatible with my own standards on a “registered package” so I won’t do it.

Another feature that’s important to me is keeping a safe working version of a reference file, editing a copy, and updating the reference file only when I’m 100% satisfied with the changes.

Sounds like a problem fully solved by git. Put you files under version control and you can always safely revert all changes made by AI.

1 Like

So just to be clear, let’ say you have a REPL where you’ve defined a variable df_total and that variable does not yet appear in the code you are working on, with this package, can Claude suggest df_total in your code as you write?

Can you clarify what you think Claude will know about df_total? Does it know column names, for example?

I don’t fully understand your question, but I’ll try to give a short example on how you’d work with it. At its core this MCP just allows Claude to execute code in your repl. It sends a command and it gets the text/plain representation same as you do when you work with the repl. So lets say you have a dataframe df_total loaded in the repl. You can prompt claude something like:

You have df_total available in the julia-repl. Please check the contents of the dataframe and give me a summary.

Claude can then execute code like

agent> df_total
# gets the df pretty print
agent> names(df_total)
# gets the list of names in text/plain repr
agent> for n in names(df_total)
    println("mean of ", n, " = ", mean(df_total[:, n]))
end
# gets everything which was printed to stdout in that loop

and similar, to get and understanding of the dataframe and report back to you. You, as a user, will see the commands and the output in the repl as if you would have typed them yourself.
Since Claude can inspect the df_total object interactively, you could ask it stuff like

Write me a function, which takes a dataframe of that format and filters our all the rows where size > 10 and name states with an “U”.

My main motivation behind that package was, that I do all my coding in this interactive style. Mainly working in my test environment, writing functions in my package, writing testes in my test file and executing them in the repl (similar to shift+Enter in vscode) until i get the desired result. Since I belive that is THE way of using julia, i wanted to give the Agent the same tools.

Since it is all in the same REPL context, if the Agent where to execute

agent> a = :foo

you, as a user would get

julia> a
:foo

afterwards.

At its core this MCP just allows Claude to execute code in your repl. It sends a command and it gets the text/plain representation same as you do when you work with the repl.

Okay this answers part of my question. It doesn’t know the contents of df_total but it does know whatever it can learn from the printed output, which probably includes some column names.

But your response doesn’t answer a main part of my question which is: Let’s say I’m writing in Cursor and Claude is attached to it. Claude is aware of my REPL because I expose my REPL to it via a MCP. This means that Claude’s auto-complete will suggest things about df_total when I’m typing a script in Cursor?

I think overall the statefull appraoch of Julia might not fit the LLMs so well. It has lots of problems with environments and environment management especially.

I guess you’re at least partly referring to the issues discussed in that thread: Interactive prototyping workflows - #36 by kousu

Therefore, I hold the agent on quite a short leash. Having short tasks, review often within a session and essentially just offloading tedious coding tasks but close to none of the software design aspects. If thats the best way to use AI? Probably not.

As I mentioned, I chat with it, and gradually it’s building my entire codebase. I also try to keep things under control, especially for the performance critical parts. What I find interesting about your package is that, as I understand it, it allows the agent to be immediately aware of the results of its own advice. My initial hope was that instead of me having to test several iterations of its suggestions manually, the agent could do that in the background and present me with either a working solution or a few viable working options. Then I could refine the chosen solution further and, at the end, ask for performance improvements. At that stage, the process would start again, it would test, iterate, and finally produce a version that I could accept.

Well, no need to register it as you can just do julia>] add/dev … to try it out.

Yes, I’m aware of that method. I just wanted to be polite. :- )

> Another feature that’s important to me is keeping a safe working version of a reference file, editing a copy, and updating the reference file only when I’m 100% satisfied with the changes.

Sounds like a problem fully solved by git. Put you files under version control and you can always safely revert all changes made by AI.

Well, yes and no at the same time. For me, there’s a difference between manually reverting incorrect changes and having a system that manages state and direction more intelligently.

EDIT: I just wanted to add that I wrote this as my best attempt at a generalization based on my experience with these models. However, I think coding is much more complex. One thing I’d stand by is the importance of verifying its advice. Testing incorrect suggestions (even though less common than a few months ago) manually is really a waste of time.

Well, it knows nothing about that object, not even that it exists. But if you tell it it exsist, it can learn every last detail, including all the data by querying it in the repl.

Regarding cursor auto-complete: I don’t think it will work out of the box.

I think in order to allow for this, one need to extend the functionality of the repl mcp server. For example, there could be a “cheap” mcp-tool get-names which returns all symbols currently known to the repl. In the description of that tool, one would need to tell the agent that it may check if some object it lacks context for is defined in the repl. If so, it can query that object to learn more about it.
I guess it really depends on how the autocomplete in cursor works and whether you can teach cursor to use additional tools when collecting auto complete context.

I’m not sure I understand still. My mental model is that my cursor tab, containing my script, is seeing a .txt file that is constantly updating. This .txt file is just my REPL history: All my input commands julia> df_total = DataFrame(...) and the output, what is printed to stdout.

So under that mental model, Cursor knows what df_total is, more or less, by examining this .txt file of my REPL history.

How is the actual implementation different from that toy model?

The relationship is single sided: the agent can send querys to the repl. The mcp never pings the agent. The agent essentially just sends a string to be evaluated and gets a string back. In pseudo code this looks like this:

julia-repl-exec: "a = [1,2,3]"

The mcp server, executs that on the julia side and returns

julia-repl-exec returned: "[1,2,3]"

Thats all the agent gets. For the user, thats displayed as

julia> foo = 1 # user executed code before agent did something
1

agent> a = [1,2,3]
[1,2,3]

julia> bar = 2 # user executed code after the agent did something
2

The agent however does not know about the foo=1 or bar=2 commands. It may check by manually reading the history file or something like that. Alternatively, the agent could send another request to read all the defined variables in the repl

agent> names(@__MODULE__)
13-element Vector{Symbol}:
 :Base
 :Core
 :Main
 :a
 :bar
 :disable_debug_logs
 :disable_precompilation
 :enable_debug_logs
 :enable_precompilation
 :foo
 :mcprepl
 :restart
 :set_prempilation

where it could seee that bar and foo have been defined. If those also appear in the script, it could send more commands to the repl to inspect those objects.

So there is nothing like a big .txt file which contains all the interactions with the repl.

Ah okay. I believe there is another Julia package which has the behavior I describe (essentially, the LLM intercepts every REPL input and every REPL output), but doesn’t have the feature of your package in that you can’t tell an agent existing elsewhere (Cursor, for example), to do things to the Julia REPL.

What is the name of the other package if I may ask?

1 Like