[ANN] Pandoc.jl - Pandoc Interface and Pandoc Types in Julia

I just made a new release v0.4.0 of Pandoc.jl: https://github.com/kdheepak/Pandoc.jl

Pandoc is a swiss-army knife tool for converting one markup format (e.g. markdown, html, latex) to another (e.g. markdown, epub, rst) etc. You can read more about Pandoc here: https://pandoc.org/.

This Julia package provides 3 things:

  1. An interface to pandoc_jll v3.1: GitHub - JuliaBinaryWrappers/pandoc_jll.jl OR using your environment pandoc cli.
  2. A @kwdef struct that let’s you construct the command line interface to Pandoc:
help?> Pandoc.Converter
  This is a Converter options struct. It supports all of pandoc's command line arguments.

  You can use it like so:

  julia> run(Converter(; input = "# Header level 1"))
  "<h1 id="header-level-1">Header level 1</h1>
  "

  julia> c = Pandoc.Converter(; input = "# Header level 1")
  `pandoc`

  julia> c.from = "markdown";

  julia> c.to = "rst";

  julia> c
  `pandoc -f markdown -t rst`

  julia> run(c)
  "Header level 1
  ==============
  "


  mutable struct Converter

    •  input::Union{String, FilePathsBase.AbstractPath, Pandoc.Document, Vector{<:FilePathsBase.AbstractPath}}

    •  from::Union{Nothing, String}

    •  to::Union{Nothing, String}

    •  output::Union{Nothing, String}

    •  defaults::Union{Nothing, String}

    •  file_scope::Bool

    •  sandbox::Bool

    •  data_dir::Union{Nothing, String}

    •  template::Union{Nothing, String}

...
  1. Pandoc Types reimplemented in Julia for writing custom filters in Julia.

The serialization and deserialization of a Pandoc JSON is done using JSON3 and StructTypes.

You can use this package to process the JSON AST of a markup document, modify it, and use pandoc to convert it to any other markup.

Here’s any example of converting a markdown file to a markdown file but incrementing all the level headings by 1.

julia> using Pandoc

julia> doc = Pandoc.Document(raw"""
       # header level 1
       
       This is a paragraph.
       
       ## header level 2
       
       This is another paragraph.
       """);

julia> for block in doc.blocks
         if block isa Pandoc.Header
           block.level += 1
         end
       end

julia> run(Pandoc.Converter(input = doc, from="json", to="markdown")) |> println
## header level 1

This is a paragraph.

### header level 2

This is another paragraph.

Refer to the documentation for more information: https://kdheepak.com/Pandoc.jl/

27 Likes

Interesting! How do you use this in a typical workflow? Or was this more of a hobby project?

TLDR: If Documenter or Franklin or Quarto don’t serve your need, and you want something custom built, you may be interested in this package.

I use Pandoc every chance I get. For work, if I’m writing a report, I start in a Pandoc flavored Markdown and then convert it to docx or pdf when I want to share it with others. I use heavily for documentation of my projects / notes, and convert these notes to HTML or Reveal.js slides or PPTX when I want to present the content to others. For personal projects, I use Pandoc for my blog.

Pandoc’s Markdown flavor has a number of features like definition lists, support for divs etc, and even allows for extensions by writing filters. You can write filters using Lua (most performant approach) or in another other language that has a JSON reader / writer. Lua is great for writing quick filters, but it lacks a standard library with a lot of features. I’d personally prefer writing Pandoc filters in Julia or Rust.

This package lets you read a Pandoc JSON AST representation of a markup file (markdown, rst, html, docx, etc), manipulate the AST in Julia types, and write out the markup file again in any other format (markdown, rst, html, epub, pptx, beamer etc).

Being in Julia, I think one could take advantage of the Julia ecosystem and write custom filters that enable all sorts of functionality.

Hope that helps?

13 Likes