I’m trying to figure out a way to replace particular nodes in a markdown syntax tree with a new list of sibling sub-trees. For example, I might want to replace the node (subtree) corresponding to the markdown [text with *emphasis*](url)
with several sibling subtrees corresponding to e.g. [new text with *emphasis*](url1)] and [more text](url2)
.
For context: I’m ruminating about how I might go about implementing solutions for #6 and #14 of DocumenterCitations, based on the upcoming release of Documenter that has switched to using MarkdownAST to process .md
files in project documentation.
Consider the following MWE:
using Pkg
Pkg.activate(temp=true)
Pkg.add(url="https://github.com/JuliaDocs/Documenter.jl", rev="master")
Pkg.add("MarkdownAST")
Pkg.add("AbstractTrees")
import Documenter
import MarkdownAST
import Markdown
import AbstractTrees
MD_MINIMAL = raw"""
Text with [rabiner_tutorial_1989](@cite).
"""
function parse_md_string(mdsrc)
mdpage = Markdown.parse(mdsrc)
return convert(MarkdownAST.Node, mdpage)
end
mdast = parse_md_string(MD_MINIMAL)
println("====== IN =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("=== TRANSFORM ===")
for (i, node) in enumerate(AbstractTrees.PostOrderDFS(mdast))
println("$i: node.element= $(node.element) [$(length(node.children)) children]")
# We want to repalce nodes that are Links with "@cite" as the target with *several* new nodes
if node.element == MarkdownAST.Link("@cite", "")
new_text = "[Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`"
# In reality, new_text would be derived from the link node. Might
# contain multiple links and arbitrary inline formatting.
println("-> Doing transform to new text=\"$new_text\"")
new_nodes = Documenter.mdparse(new_text; mode=:span)
for n in new_nodes
MarkdownAST.insert_before!(node, n)
end
MarkdownAST.unlink!(node)
end
end
println("====== OUT =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("====== END =======")
This seems to work fine, producing the following output:
====== IN =======
AS AST:
mdast = @ast MarkdownAST.Document() do
MarkdownAST.Paragraph() do
MarkdownAST.Text("Text with ")
MarkdownAST.Link("@cite", "") do
MarkdownAST.Text("rabiner")
MarkdownAST.Emph() do
MarkdownAST.Text("tutorial")
end
MarkdownAST.Text("1989")
end
MarkdownAST.Text(".")
end
end
AS TEXT:
Text with [rabiner*tutorial*1989](@cite).
=== TRANSFORM ===
1: node.element= MarkdownAST.Text("Text with ") [0 children]
2: node.element= MarkdownAST.Text("rabiner") [0 children]
3: node.element= MarkdownAST.Text("tutorial") [0 children]
4: node.element= MarkdownAST.Emph() [1 children]
5: node.element= MarkdownAST.Text("1989") [0 children]
6: node.element= MarkdownAST.Link("@cite", "") [3 children]
-> Doing transform to new text="[Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`"
7: node.element= MarkdownAST.Text(".") [0 children]
8: node.element= MarkdownAST.Paragraph() [5 children]
9: node.element= MarkdownAST.Document() [1 children]
====== OUT =======
AS AST:
mdast = @ast MarkdownAST.Document() do
MarkdownAST.Paragraph() do
MarkdownAST.Text("Text with ")
MarkdownAST.Link("https://github.com/JuliaDocs/DocumenterCitations.jl", "") do
MarkdownAST.Text("Citation")
end
MarkdownAST.Text(" for ")
MarkdownAST.Code("key")
MarkdownAST.Text(".")
end
end
AS TEXT:
Text with [Citation](https://github.com/JuliaDocs/DocumenterCitations.jl) for `key`.
====== END =======
It also works for a longer
MD_FULL = raw"""
# Markdown document
Let's just have a couple of pagragraphs with inline elements like *italic* or
**bold**.
We'll also have inline math like ``x^2`` (using the double-backtick syntax
preferred by [Julia](https://docs.julialang.org/en/v1/stdlib/Markdown/#\\LaTeX),
in lieu of `$`)
## Citation
Some citation links [rabiner_tutorial_1989; with *emphasis*](@cite) (with
inline formatting) and [GoerzQ2022](@cite) (without inline formatting).
## Lists
* First item with just plain text
* Second item with *emphasis*
* Third item with `code`
This concludes the file.
"""
instead of MD_MINIMAL
. The success depends very much on using AbstractTrees.PostOrderDFS
as well as inserting to the left of the current node (MarkdownAST.insert_before!
). Note how 7:
continues at exactly the correct element of the original tree; and 8:
shows the new number of children (5, not 3) without getting confused. Using PreOrderDFS
would fail (at leat for MD_FULL
). The behavior makes sense, although I could imagine implementations of PostOrderDFS
that would not be able to handle this specific mutation of the tree as it’s being traversed. The MarkdownAST rightfully warns about mutating the tree while traversing. Alas, AbstractTrees.PostOrderDFS
seems to handle this particular mutation perfectly fine.
Did I overlook something that would make this not work for my use case? Would it be a bad idea to assume that future versions of AbstractTrees
will continue to work correctly here? Does anyone have any suggestions on how to write this in a more robust way? I just started looking at the details of the MarkdownAST package, so I’m not yet deeply familiar with all of its facilities.
Maybe @mortenpi as the originator of MarkdownAST.jl has some insights here?
This sort of things might also be something to include as a Howto in the documentation of MarkdownAST.