[ANN] Norg.jl: A Norg file format parser for Julia

klafyvel · November 11, 2022, 4:43pm

Hello, Julia discourse,

Today, I would like to introduce Norg.jl, a Norg file format parser written in Julia.

A few weeks ago, I took an interest in Neorg, a Neovim plugin inspired by Emacs org-mode. Neorg uses the Norg file format. To quote the specification,

The Norg syntax is a structured plain-text file format which aims to be human readable when viewed standalone while also providing a suite of markup utilities for typesetting structured documents. Compared to other plain-text file formats like e.g. Markdown, Org, RST or AsciiDoc, it sets itself apart most notably by following a strict philosophy to abide by the following simple rules:

Consistency: the syntax should be consistent. Even if you know only a part of the syntax, learning new parts should not be surprising and rather feel predictable and intuitive.

Unambiguity: the syntax should leave no room for ambiguity. This is especially motivated by the use of tree-sitter for the original syntax parser, which takes a strict left-to-right parsing approach and only has single-character look-ahead.

Free-form: whitespace is only used to delimit tokens but has no other significance! This is probably the most contrasting feature to other plain-text formats which often adhere to the off-side rule, meaning that the syntax relies on whitespace-indentation to carry meaning.

Although built with Neorg in mind, Norg can be utilized in a wide range of applications, from external note taking plugins to even messaging applications. Thanks to its layers system one can choose the feature set they’d like to support and can ignore the higher levels.

With such an introduction, I thought that even someone with little experience writing parsers should be able to implement one. And indeed, it turned out sufficiently well for me to share this package!

The library has a wide range of tests, ensuring it behaves reasonably well for the set of features implemented, and offers code generation for Pandoc JSON and HTML, meaning you can use it to export Norg documents to any target format supported by Pandoc.

Implementation-side, Norg.jl uses JuliaSyntax’s Kind for representing token types and AST Nodes to keep code as type-stable as possible. Other than that, it uses a classical tokenization/matching/parsing system.

There are still a lot of features missing, such as link resolution, layers 3, 4 and 5 implementations, and semantics analysis…But they will come in time as I keep working on the library.

You can have a look at the documentation for further details on Norg.jl.

stevengj · November 11, 2022, 5:29pm

Maybe separate parsing and conversion? i.e. if do nt = parse(NorgText, string) to parse to some data structure, and then do Docs.HTML(nf) to construct HTML text, or JSON(nf) to construct to JSON, or Markdown.MD(nf) to convert to markdown.

This will make it easier for other packages to extend, e.g. if someone wants to add an additional output format, or bidirectional conversion, or Norg output.

klafyvel · November 11, 2022, 5:37pm

Will do! For now the AST structure is not that well documented, but you can already generate ASTs with the following syntax:

parse(Norg.AST.NorgDocument, s)

You can see an example of how this can be used here.

Topic		Replies	Views
TextFormats parser generator Biology, Health, and Medicine	5	533	March 5, 2023
Format text with Orgmode syntax Pluto question	1	315	July 30, 2025
Writing a parser in Julia General Usage	10	7467	August 30, 2018
ANN: Automa.jl - a package to compile regular expressions to Julia Community package , announcement	6	2846	February 2, 2017
Straw poll for emacs / org-mode workflow General Usage editors , emacs	24	4145	August 29, 2018

[ANN] Norg.jl: A Norg file format parser for Julia

Related topics