Vectorized language(s) J (and APL) are even better. Leads me to believe Julia could also be better. You can code vectorize or not (loopy), even mix. And maybe generating Julia defaults to non-vectorized. Possibly we should ask for vectorized by default, or at least let the AI consider it.
Update: A lot of people asked about APL. I reran on a smaller set of like-for-like coding tasks - it came 4th at 110 tokens. Turns out APL’s famous terseness isn’t a plus for LLMs: the tokenizer is badly optimised for its symbol set, so all those unique glyphs (⍳, ⍴, ⌽, etc.) end up as multiple tokens each.
Update 2: A reader reached out about J - a language I’d never heard of. It’s an array language like APL but uses ASCII instead of special symbols. It dominates at just 70 tokens average, nearly half of Clojure (109 tokens). Array languages can be extremely token-efficient when they avoid exotic symbol sets. If token efficiency turns out to be a key driver, this is perhaps a very interesting way for languages to evolve.
[For those that don’t know, J is an APL-variant; I’m not sure what the “unique glyphs (⍳, ⍴, ⌽, etc.)” map the there exactly, I assume more than a single ASCII letter. In APL code-page those are I believe only one byte, in Unicode more; still just one token? It seems J might might actually need more tokens for final code? Maybe the reatively less efficient APL is just about not finetuning on it more? I still guess J is more or much finetined on, or a higher-resource language?]
Unsurprisingly, dynamic languages were much more token efficient (not having to declare any types saves a lot of tokens) - though JavaScript was the most verbose of the dynamic languages analysed.
I tended to believe JavaScript was one of the best languages for genAI, likely because a high-resource language, like Python; and also just much used (many do not even consider there being any alternatives, for web use). This test of token efficient, is though maybe too limited ignores library/framework use that I guess the AI knows about and exploits e.g. for web use. Also this is only about completing to a correct solution, not taking runtime into account, so there are also other metrics.
I would also like to know how well another APL variant, BQN, works. And Forth and other concatenative languages, like Factor and Kitten. Anyone have a good feeling or bad on using those? AI thinks such Concatenative/tacit are genAI hostile, but not APL tacit languages…