[ANN] Julia LLM Leaderboard - Help us make it more relevant for every day problems!

I can’t wait to try this one: Phind

Phind didn’t pass my usual test (so far, actually no LLM has been able to do so). I always ask the question “in Julia, do slices behave as copies or views?”. It was giving the wrong answer.

1 Like

It’s not yet a full release (there is a bit of work left still), but I wanted to share some preliminary findings about the new Anthropic models.

Turns out Claude-3 Haiku broke the value-for-money barrier previously owned by GPT-3.5-Turbo:

I’ve been really impressed by Claude Opus but the real star of the show here is Claude Haiku!
I paid $30 for the Opus evals, but only $0.4 for Haiku evals!!!
(and Opus had a crazy bad availability, I had to restart many times)

Btw. you can now use Anthropic models also for data extraction/function calling (released today) - it’s available in v0.18 of PromptingTools among other things.

What’s next? Exciting GSoC project (hopefully), new test cases, new models, and a new category of evals… A lot to look forward to.

5 Likes