Non-code analysis documentation - TOML, YAML, something else?

I am toying with an idea of having something like config files / pipeline documention in a form of human readable and writable format. It would be best to be language agnostic, so I can later write e.g. a parser in python to do the same steps there.

I thought Julia is all-in on TOML files, but recently I saw some discussions on pros/cons of TOML, YAML, etc. So I was wondering what is your general preference? Or something that would be an important difference between these two.
Or maybe there is something new that could be worth looking into?

1 Like

How complex are your needs? TOML and YAML are both fine, until they’re not.

Not that complex, I think. Right now, I can’t think of a situation, where I would need more than 3 levels of depth and the whole thing wouldn’t be typically longer than 100 lines or so.
Yeah, I read through a blogpost about how bad TOML is, but I got the idea that it fringes on some technicalities that are noticeable at scale.
There was some shift in my field to use JSON for such sidecar purposes, but I don’t enjoy the amount of braces it entails. :wink:

I think you probably mean you read a blog post about how bad YAML is. YAML is very complex and has a lot of weird sharp edges. TOML is simple and reasonable, the only real issue being that it can be a bit verbose if you nest deeply. Julia uses TOML for all its configuration files. YAML is only used to configure third party CI stuff that requires it and isn’t parsed by Julia itself. JSON is very common and is simple but not especially friendly to read or write in my opinion.

4 Likes

It was this post (shared on julia slack’s random channel)
https://hitchdev.com/strictyaml/why-not/toml/
(but written by someone who is designing their own version of YAML, I think?)

up to now I was not aware of problems with either of the formats and was planning on using TOML since it is so ubiquitous in Julia. But thought I would ask either way.

This post was recently on the front page of Hacker News too. The discussion there includes a lot of criticism of the post:

I disagree with most of those points…

In this example of a StrictYAML story and its equivalent serialized TOML the latter ends up spending 50% more characters to represent the exact same data.

50% more characters!!! Oh no! :scream_cat: Eh, who cares? I do not. I’ll take a little extra verbosity in exchange for clarity any day. YAML leans hard into conciseness at great cost to clarity and potential confusion. This is also an ironic argument coming from a Python person, since Python consistently favors verbose clarity over conciseness.

TOML’s hierarchies are difficult to infer from syntax alone

This is a weird one since the very thing that’s making TOML more verbose also makes the structure easy to decipher. Yes, using insignificant indentation can help make the structure even clearer, but that’s ok—now you have two independent ways to understand the structure—syntax and indentation. I get how Python people might be inclined towards significant indentation instead, but every time I have to use Python and I try to cut and paste example code into a Python REPL and it just won’t f@¢&ing work, I’m reminded of why significant indentation is a bad idea: indentation alone is brittle and easy to break or get subtly wrong. Best to let it reinforce syntactic structure rather than being the sole source of structure.

Overcomplication: Like YAML, TOML has too many features

I do agree that TOML has a few too many features. I don’t think dates and times should be in the format, for example. But YAML has soooo many excessive features. This is like comparing a hunting knife and a bazooka and saying “well, they’re both dangerous”. Sure, but one of them is a heck of a lot more dangerous and equating them is kind of disingenuous.

Syntax typing

I dunno, StrictYAML having all values be strings and forcing the programmer to parse and interpret them everywhere seems terrible to me. Almost every programming and data language have literals of different types and I think that’s reasonable and good and would require very compelling evidence otherwise. Everything being a string feels like something unfortunate that StrictYAML is forced into by YAML’s poor design choices (is NO a string or a false Boolean?) which they’re trying really hard to convince you is actually a good thing.

9 Likes