[ANN] Norg.jl: A Norg file format parser for Julia

Hello, Julia discourse,

Today, I would like to introduce Norg.jl, a Norg file format parser written in Julia.

A few weeks ago, I took an interest in Neorg, a Neovim plugin inspired by Emacs org-mode. Neorg uses the Norg file format. To quote the specification,

The Norg syntax is a structured plain-text file format which aims to be human readable when viewed standalone while also providing a suite of markup utilities for typesetting structured documents. Compared to other plain-text file formats like e.g. Markdown, Org, RST or AsciiDoc, it sets itself apart most notably by following a strict philosophy to abide by the following simple rules:

  1. Consistency: the syntax should be consistent. Even if you know only a part of the syntax, learning new parts should not be surprising and rather feel predictable and intuitive.
  2. Unambiguity: the syntax should leave no room for ambiguity. This is especially motivated by the use of tree-sitter for the original syntax parser, which takes a strict left-to-right parsing approach and only has single-character look-ahead.
  3. Free-form: whitespace is only used to delimit tokens but has no other significance! This is probably the most contrasting feature to other plain-text formats which often adhere to the off-side rule, meaning that the syntax relies on whitespace-indentation to carry meaning.

Although built with Neorg in mind, Norg can be utilized in a wide range of applications, from external note taking plugins to even messaging applications. Thanks to its layers system one can choose the feature set they’d like to support and can ignore the higher levels.

With such an introduction, I thought that even someone with little experience writing parsers should be able to implement one. And indeed, it turned out sufficiently well for me to share this package!

The library has a wide range of tests, ensuring it behaves reasonably well for the set of features implemented, and offers code generation for Pandoc JSON and HTML, meaning you can use it to export Norg documents to any target format supported by Pandoc.

Implementation-side, Norg.jl uses JuliaSyntax’s Kind for representing token types and AST Nodes to keep code as type-stable as possible. Other than that, it uses a classical tokenization/matching/parsing system.

There are still a lot of features missing, such as link resolution, layers 3, 4 and 5 implementations, and semantics analysis…But they will come in time as I keep working on the library.

You can have a look at the documentation for further details on Norg.jl.

6 Likes

Maybe separate parsing and conversion? i.e. if do nt = parse(NorgText, string) to parse to some data structure, and then do Docs.HTML(nf) to construct HTML text, or JSON(nf) to construct to JSON, or Markdown.MD(nf) to convert to markdown.

This will make it easier for other packages to extend, e.g. if someone wants to add an additional output format, or bidirectional conversion, or Norg output.

3 Likes

Will do! For now the AST structure is not that well documented, but you can already generate ASTs with the following syntax:

parse(Norg.AST.NorgDocument, s)

You can see an example of how this can be used here.