Replace a markdown node with MarkdownAST/AbstractTrees

I’m trying to figure out a way to replace particular nodes in a markdown syntax tree with a new list of sibling sub-trees. For example, I might want to replace the node (subtree) corresponding to the markdown [text with *emphasis*](url) with several sibling subtrees corresponding to e.g. [new text with *emphasis*](url1)] and [more text](url2).

For context: I’m ruminating about how I might go about implementing solutions for #6 and #14 of DocumenterCitations, based on the upcoming release of Documenter that has switched to using MarkdownAST to process .md files in project documentation.

Consider the following MWE:

using Pkg
Pkg.activate(temp=true)
Pkg.add(url="https://github.com/JuliaDocs/Documenter.jl", rev="master")
Pkg.add("MarkdownAST")
Pkg.add("AbstractTrees")

import Documenter
import MarkdownAST
import Markdown
import AbstractTrees

MD_MINIMAL = raw"""
Text with [rabiner_tutorial_1989](@cite).
"""

function parse_md_string(mdsrc)
    mdpage = Markdown.parse(mdsrc)
    return convert(MarkdownAST.Node, mdpage)
end

mdast = parse_md_string(MD_MINIMAL)
println("====== IN =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("=== TRANSFORM ===")
for (i, node) in enumerate(AbstractTrees.PostOrderDFS(mdast))
    println("$i: node.element= $(node.element) [$(length(node.children)) children]")
    # We want to repalce nodes that are Links with "@cite" as the target with *several* new nodes
    if node.element == MarkdownAST.Link("@cite", "")
        new_text = "[Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`"
        # In reality, new_text would be derived from the link node. Might
        # contain multiple links and arbitrary inline formatting.
        println("-> Doing transform to new text=\"$new_text\"")
        new_nodes = Documenter.mdparse(new_text; mode=:span)
        for n in new_nodes
            MarkdownAST.insert_before!(node, n)
        end
        MarkdownAST.unlink!(node)
    end
end
println("====== OUT =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("====== END =======")

This seems to work fine, producing the following output:

====== IN =======
AS AST:
mdast = @ast MarkdownAST.Document() do
  MarkdownAST.Paragraph() do
    MarkdownAST.Text("Text with ")
    MarkdownAST.Link("@cite", "") do
      MarkdownAST.Text("rabiner")
      MarkdownAST.Emph() do
        MarkdownAST.Text("tutorial")
      end
      MarkdownAST.Text("1989")
    end
    MarkdownAST.Text(".")
  end
end

AS TEXT:
Text with [rabiner*tutorial*1989](@cite).
=== TRANSFORM ===
1: node.element= MarkdownAST.Text("Text with ") [0 children]
2: node.element= MarkdownAST.Text("rabiner") [0 children]
3: node.element= MarkdownAST.Text("tutorial") [0 children]
4: node.element= MarkdownAST.Emph() [1 children]
5: node.element= MarkdownAST.Text("1989") [0 children]
6: node.element= MarkdownAST.Link("@cite", "") [3 children]
-> Doing transform to new text="[Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`"
7: node.element= MarkdownAST.Text(".") [0 children]
8: node.element= MarkdownAST.Paragraph() [5 children]
9: node.element= MarkdownAST.Document() [1 children]
====== OUT =======
AS AST:
mdast = @ast MarkdownAST.Document() do
  MarkdownAST.Paragraph() do
    MarkdownAST.Text("Text with ")
    MarkdownAST.Link("https://github.com/JuliaDocs/DocumenterCitations.jl", "") do
      MarkdownAST.Text("Citation")
    end
    MarkdownAST.Text(" for ")
    MarkdownAST.Code("key")
    MarkdownAST.Text(".")
  end
end

AS TEXT:
Text with [Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`.
====== END =======

It also works for a longer

MD_FULL = raw"""
# Markdown document

Let's just have a couple of pagragraphs with inline elements like *italic* or
**bold**.

We'll also have inline math like ``x^2`` (using the double-backtick syntax
preferred by [Julia](https://docs.julialang.org/en/v1/stdlib/Markdown/#\\LaTeX),
in lieu of `$`)

## Citation

Some citation links [rabiner_tutorial_1989; with *emphasis*](@cite) (with
inline formatting) and [GoerzQ2022](@cite) (without inline formatting).

## Lists

* First item with just plain text

* Second item with *emphasis*

* Third item with `code`

This concludes the file.
"""

instead of MD_MINIMAL. The success depends very much on using AbstractTrees.PostOrderDFS as well as inserting to the left of the current node (MarkdownAST.insert_before!). Note how 7: continues at exactly the correct element of the original tree; and 8: shows the new number of children (5, not 3) without getting confused. Using PreOrderDFS would fail (at leat for MD_FULL). The behavior makes sense, although I could imagine implementations of PostOrderDFS that would not be able to handle this specific mutation of the tree as it’s being traversed. The MarkdownAST rightfully warns about mutating the tree while traversing. Alas, AbstractTrees.PostOrderDFS seems to handle this particular mutation perfectly fine.

Did I overlook something that would make this not work for my use case? Would it be a bad idea to assume that future versions of AbstractTrees will continue to work correctly here? Does anyone have any suggestions on how to write this in a more robust way? I just started looking at the details of the MarkdownAST package, so I’m not yet deeply familiar with all of its facilities.

Maybe @mortenpi as the originator of MarkdownAST.jl has some insights here? :wink:

This sort of things might also be something to include as a Howto in the documentation of MarkdownAST.

Yes, I would say whether the mutation happens to work correctly for a given iterator is very much an implementation detail. An internal change in either MarkdownAST or AbstractTrees might (theoretically) break things.

Also, since whether it works or not depends on the iterator and how you mutate the tree, mutating the structure while iterating is just a bad pattern in my opinion, even if it happens to work and is very unlikely to break — changes to your code may also cause a hard-to-spot failure.

I think my suggestion for handling these types of mutations would be to do a non-mutating pass over the whole tree first, pick out the matching Nodes, push them to an array, and then mutate them in a separate loop afterwards.

This sort of things might also be something to include as a Howto in the documentation of MarkdownAST.

Yes!

1 Like

Right, I suppose I knew that was the correct answer when I posted: you can’t safely iterate and mutate at the same time :wink:

At least on paper/pseudocode, building a mutated copy of the original AST is relatively straightforward. What I was getting hung up on – why I didn’t start with that in the first place – was how to “copy” a node without its children. Looking at the source code of copy_tree really clarified how to do that. So I think using the recursive rewrite_links function below (modeled after copy_tree) is a clean way to implement this:

using Pkg
Pkg.activate(temp=true)
Pkg.add(url="https://github.com/JuliaDocs/Documenter.jl", rev="master")
Pkg.add("MarkdownAST")
Pkg.add("AbstractTrees")

import Documenter
import MarkdownAST
import Markdown
import AbstractTrees

MD_MINIMAL = raw"""
Text with [rabiner_tutorial_1989](@cite).
"""

function parse_md_string(mdsrc)
    mdpage = Markdown.parse(mdsrc)
    return convert(MarkdownAST.Node, mdpage)
end

function rewrite_links(node::MarkdownAST.Node{M}) where M
    new_node = MarkdownAST.Node{M}(node.element, deepcopy(node.meta))
    for child in node.children
        if child.element == MarkdownAST.Link("@cite", "")
            new_text = "[Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`"
            # In reality, new_text would be derived from the link node. Might
            # contain multiple links and arbitrary inline formatting.
            println("-> Doing transform to new text=$new_text")
            expanded = Documenter.mdparse(new_text; mode=:span)
            append!(new_node.children, expanded)
        else
            push!(new_node.children, rewrite_links(child))
        end
    end
    return new_node
end

mdast = parse_md_string(MD_MINIMAL)
println("====== IN =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("=== TRANSFORM ===")
mdast = rewrite_links(mdast)
println("====== OUT =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("====== END =======")

It doesn’t technically mutate the original mdast, but I can always replace the original root’s children with the new root’s children at the end.

I think this can be generalized to an implementation of Base.replace and Base.replace! for the in-place version: Implement `Base.replace` and `Base.replace!` · Issue #20 · JuliaDocs/MarkdownAST.jl · GitHub