From the time I started to work with Sonnet 3.5.
I feel like in Julia programming language the LLM is just was too skilled. I could even believe that due to the descriptive force of the julia and pythonish simplicity in Julia the sonnet 3.5 could beat every other language on the same scale like the other languages, BUT with exceptionally higher accuracy.
Can we get into this place with it if we implement one of the benchmark dataset in Julia-LLM-Leaderboard? Big Code Models Leaderboard - a Hugging Face Space by bigcode
(Sidenote: based on Julia descriptive power, Julia could became the first languages that is actually could be developed by LLMs. )