Improving LLM-generated Julia code, especially for Makie visualizations

this might be an annoying answer (because it is an expensive one), but the biggest problem for you is probably the underpowered choice of tool. The SOTA agents (Claude Opus 4.7, Codex 5.5) are much much better. especially with thinking turned to max, I already get quite good Julia code out of them.

but maybe [Help Wanted] Help contribute test cases to improve LLM performance on Julia code will interest you, if you have particular workloads that agents have really struggled on?