Improving LLM-generated Julia code, especially for Makie visualizations

Hi everyone,

I’d like to ask for advice on how to improve the quality of Julia code generated by LLMs, and to spark a discussion about what the community could do to make LLMs more useful for Julia tasks.

What I’ve tried so far
I wanted to use a local LLM to create plots with Makie. To give it better context, I downloaded the entire Makie documentation and set up a RAG system using Anything LLM + Gemma 4 MoE. Unfortunately, that didn’t help much – the generated code still looked very “Python-like” and rarely worked out of the box.

The frustration
Many AI coding agents, when asked to visualize some abstraction, produce Python or JavaScript code that actually runs. It’s clear that these models have been trained (or fine-tuned) on enough examples that they can even self-correct when they hit a problem, resulting in a working solution right away. I’d love to see something similar happen for Julia – ideally a tool that can generate ready-to-use Pluto or Bonito notebooks, or at least produce working scripts for Makie visualizations.

Question for the community
What would it take to get there? Could we create a large, high-quality public dataset of Julia visualization code (and more generally, idiomatic Julia examples) and fine-tune a local LLM on it? Are there already efforts in this direction that I could contribute to? I’d really appreciate any suggestions, experiences, or pointers to existing projects.

I already read similar topics, but maybe it’s time to make some updates?

Thanks in advance!

this might be an annoying answer (because it is an expensive one), but the biggest problem for you is probably the underpowered choice of tool. The SOTA agents (Claude Opus 4.7, Codex 5.5) are much much better. especially with thinking turned to max, I already get quite good Julia code out of them.

but maybe [Help Wanted] Help contribute test cases to improve LLM performance on Julia code will interest you, if you have particular workloads that agents have really struggled on?

I think you are right about the dominance of Python in the training bases of most models. To get rid of that would probably require training a model from scratch using just a Julia base, say the entire registry. While that’s possible in theory, very few of us have the near-petabyte storage potentially required.

So, the task becomes how better to prompt to recursively translate the initial output into idiomatic Julia. That’s something I’d be happy to work with you on using my own local models. Can we start with a few of your MWEs?

Out of curiosity, has anyone published some Claude code skills yet? Lookig at the MCP developers. Looks like a natural progression of this?