TextFormats parser generator

Came across this library: GitHub - ggonnella/textformats and paper: TextFormats: Simplifying the definition and parsing of text formats in bioinformatics

They define a way to describe a text-based file format (like FASTA, SAM) in YAML / JSON and generate readers, writers and validators based on that specification.

This is done in nim with bindings to C, C++, and python. Do we have anything similar in Julia?

Not that I’m aware of - most of the parsers in BioJulia use Automa.jl, which generates state machines from regular expressions.

cc @jakobnissen

I don’t know of any Julia package that does this. But I’m also skeptical it will bring any value:

  • If the advantage is to automatically generate parsers from higher-level descriptions, then we have ParserCombinator.jl (which I haven’t tried) for nested formats and Automa.jl for flat formats.
  • If the advantage is that we could simply use format descriptions intended for other programming languages without needing to modify them, then I doubt that the bespoke format mentioned in that article, as opposed to, say Bakcus-Naur form, would be more widespread.

Thank you for your comments @jakobnissen

I think there is value in a library of specifications of bio-data-formats in a form that can be used directly to generate parsers. The format they propose does not look terrible (to me), but the library is quite limited at this point. If it can grow to something like Kaitai it would be cool: https://formats.kaitai.io/.

See also

1 Like