Has anyone heard of TOON format? Looking for Julia implementation

Hi everyone,

I recently came across TOON (Token-Oriented Object Notation), which is described as a data format designed to be more LLM-friendly than JSON. The format aims to reduce token usage while maintaining readability and type safety.

Some interesting features:

  • Compact syntax that reduces LLM token consumption
  • Built-in type annotations
  • Object-oriented structure
  • There’s an interactive playground to explore how it compares to JSON in terms of tokenization

There’s already a Python implementation available, which got me thinking about Julia.

Has anyone:

  1. Heard of or worked with TOON format?
  2. Started (or would be interested in) a Julia implementation?

Given Julia’s strengths in parsing and the growing interest in LLM applications, it seems like TOON could be a useful addition to the ecosystem. I’m curious if there’s any existing work in this direction or if others think this would be a valuable package to develop.

Would love to hear your thoughts!

1 Like

Just saw it in a forum unrelated to Julia yesterday. Has it been going viral in the past few days?

2 Likes

I saw this idea in this post Case Study: Wie wir bei Finanzfluss mit einem neuen Datenformat ~50% an LLM-Kosten sparen. 🦛 KI-Modelle werden zugleich günstiger und leistungsfähiger. Das eröffnet immer neue Möglichkeiten und… | Johann Schopplich | 93 comments and some other posts on LinkedIn.

I’m currently implementing a Julia package (using Claude Code and Kiro.dev)

Status is work in progress. Not yet registered. But quite good code coverage (with more than 2000 tests with their official Spec test suite)

4 Likes

Just a heads-up to avoid frustration when you do feel ready to register it: The package name TOON.jl will probably not be able to make it into the General registry (too short/all-caps). I’d probably recommend writing out the acronym to TokenOrientedObjectNotation.jl. You should also be aware about the guidelines for LLM usage for registered packages.

4 Likes

There is JSON.jl and JSON3.jl, isn’t it?
I didn’t know the AI slop term. Thanks for pointing to this in guidelines for LLM usage.

1 Like

There are lots of packages with short names in the General registry but the naming guidelines have evolved over time and old decisions don’t work as precedence for new packages.

That said, if TOON was as ubiquitous as JSON it would likely be accepted today but at this point there is no way to tell whether it will eventually overtake JSON or be all but forgotten in half a year.

7 Likes

yes and the joke is LLM bros have re-invented CSV xD

7 Likes

Indeed, TOON is a meme on X/Twitter already. Incredibly, the vibe-coders have re-invented a (worse) version of CSV from first principles and made it viral on LinkedIn (the professional social network where no professional software developers exist to explain to them what they’ve done).

6 Likes

There is also TONL(Token-Optimized Notation Language)
See https://tonl.dev/
Repository GitHub - tonl-dev/tonl: TONL (Token-Optimized Notation Language)

  1. If you quickly want to build something in Julia using TOON, you could use the Python library with PyCall / PythonCall / etc. Can get up and running quickly, if your goal is to evaluate TOON.

  2. To the extent that TOON works right now, it’s presumably a combination of models being comfortable with YAML and CSV. But in the long run, the “correct” protocol to communicate with models is always a function of what they’ve been pretrained/finetuned to. So, unless the major model players adopt it as a first-class protocol, TOON or whatever other protocol is unlikely to beget maximum performance. This is one of those cases where the innovation has to come through the model providers and it’s not just a question of end-users adopting something based on vibes. (Just my two cents, YMMV)

  3. If the main win of TOON is to avoid repeated field names by compressing into a CSV-like format, it’s presumably easy to cook up something like that with JSON, to get most of the benefits. Or give models literal CSV+JSON+YAML content; but whatever… there are many ways to paint the bikeshed.

(Okay, I’m getting baited and can’t help myself) If one really cares for efficient communication with minimal token count, then maybe the best way would to look into the plethora of serialization formats for storing/transmitting data, finding the most suitable one, and then training whatever AI model to become fluent in that.