[ANN] SummaryTables.jl & WriteDocx.jl

The development team at PumasAI is happy to announce the open-source release of two packages:

SummaryTables.jl for generating publication-ready tables in HTML, docx, LaTeX and Typst. We wrote SummaryTables from scratch because existing solutions for table generation like PrettyTables did not support all the table structures (for example merged cells) or output formats (docx, Typst) we needed.
grafik WriteDocx.jl for generating Microsoft Word compatible docx files from scratch. We wrote WriteDocx to have a completely flexible and precisely adjustable backend for SummaryTables and report generation, without having to fall back to tools that do “lossy” conversions through intermediate markdown representations, like Pandoc.

We hope that we can give back to the Julia community with the release of these packages and strengthen its utility ecosystem.

What follows is a short description both packages, for more please check out the documentation at

SummaryTables

SummaryTables.jl is a Julia package for creating publication-ready tables in HTML, docx, LaTeX and Typst formats. Tables are formatted in a minimalistic style without vertical lines.

SummaryTables offers the table_one, summarytable and listingtable functions to generate pharmacological tables from Tables.jl-compatible data structures, as well as a low-level API to construct tables of any shape manually.

Examples

data = DataFrame(
    sex = ["m", "m", "m", "m", "f", "f", "f", "f", "f", "f"],
    age = [27, 45, 34, 85, 55, 44, 24, 29, 37, 76],
    blood_type = ["A", "0", "B", "B", "B", "A", "0", "A", "A", "B"],
    smoker = [true, false, false, false, true, true, true, false, false, false],
)

table_one(
    data,
    [:age => "Age (years)", :blood_type => "Blood type", :smoker => "Smoker"],
    groupby = :sex => "Sex",
    show_n = true
)

data = DataFrame(
    concentration = [1.2, 4.5, 2.0, 1.5, 0.1, 1.8, 3.2, 1.8, 1.2, 0.2,
        1.7, 4.2, 1.0, 0.9, 0.3, 1.7, 3.7, 1.2, 1.0, 0.2],
    id = repeat([1, 2, 3, 4], inner = 5),
    dose = repeat([100, 200], inner = 10),
    time = repeat([0, 0.5, 1, 2, 3], 4)
)

listingtable(
    data,
    :concentration => "Concentration (ng/mL)",
    rows = [:dose => "Dose (mg)", :id => "ID"],
    cols = :time => "Time (hr)",
    summarize_rows = :dose => [
        length => "N",
        mean => "Mean",
        std => "SD",
    ]
)

categories = ["Deciduous", "Deciduous", "Evergreen", "Evergreen", "Evergreen"]
species = ["Beech", "Oak", "Fir", "Spruce", "Pine"]
data = rand(4, 5)
labels = ["", "", "Size", Annotated("Water consumption", "Liters per year"), "Age", "Value"]

body = [
    Cell.(categories, bold = true, merge = true, border_bottom = true)';
    Cell.(species)';
    Cell.(data)
]

Table(hcat(
    Cell.(labels, italic = true, halign = :right),
    body
))

Comparison with PrettyTables

PrettyTables.jl is a well-known Julia package whose main function is formatting tabular data, for example as the backend to DataFrames.jl. PrettyTables supports plain-text output because it is often used for rendering tables to the REPL, however this also means that it does not support merging cells vertically or horizontally in its current state, which is difficult to realize with plain text.

In contrast, SummaryTables’s main purpose is to offer convenience functions for creating specific scientific tables which are out-of-scope for PrettyTables. For our desired aesthetics, we also needed low-level control over certain output formats, for example for controlling cell border behavior in docx, which were unlikely to be added to PrettyTables at the time of writing this package.

WriteDocx

WriteDocx is a utility package that lets you create .docx files compliant with ECMA-376, for use with Microsoft Office Word and other compatible software. Under the hood, these files are zip files containing a standardized folder structure with XML files and other assets.

WriteDocx contains many Julia types that mirror the types of XML nodes commonly found in docx files, without the user having to write any XML manually.

Here’s a simple document with two paragraphs, one of which has pink-colored text:

import WriteDocx as W

doc = W.Document(
    W.Body([
        W.Section([
            W.Paragraph([
                W.Run([W.Text("Hello world, from WriteDocx.jl")]),
            ]),
            W.Paragraph([
                W.Run(
                    [W.Text("Goodbye!")],
                    color = W.HexColor("FF00FF"),
                ),
            ]),
        ]),
    ]),
)

W.save("example.docx", doc)

62 Likes

Pretty cool, thanks for sharing.

On the docx front, would it be possible to open existing Word documents and insert content at specific places? I.e. if I have a table in a Word document populated with the results of some Julia analysis, could I directly swap out numbers in the table with WriteDocx?

would it be possible to open existing Word documents and insert content at specific places? I.e. if I have a table in a Word document populated with the results of some Julia analysis, could I directly swap out numbers in the table with WriteDocx?

Word documents have an insane amount of xml tags in them that are not the “content” per se, for example for tracking changes and other complex features. This is the reason that WriteDocx has the write in its name and purposefully doesn’t attempt to read the files into a Julia-native data structure (there are just too many things to implement that I don’t really care about).

But you can totally take WriteDocx data structures and render them to xml. This xml you can then splice into existing word documents at the right position and that will usually just work. It’s a little more complex if you want to insert assets like pngs or svgs, because they need entries in sidecar files that you would have to keep track of. But for tables and numbers, just inserting xml will be fine. I actually already wrote a proof of concept package for this (not open source) and it’s not a lot of lines to implement. You unzip the docx, load the main xml file with EzXML to find the place you’re interested in (there are also query operators for that) and insert the xml node you generate with WriteDocx there. Write xml file out and re-zip into a docx.

2 Likes

I wanted to try it out again, this was from scratch in a couple minutes.

Let’s say you have this document:

Then you can open the docx, find the w:tbl element that contains the placeholder text, and splice the SummaryTables table in there.

using SummaryTables
using WriteDocx
using EzXML
using ZipFile


file = expanduser("~/Downloads/template.docx")

function modify_docx(func, file, outputfile)
    mktempdir() do dir
        r = ZipFile.Reader(file)
        for f in r.files
            realpath = joinpath(dir, f.name)
            mkpath(dirname(realpath))
            open(realpath, "w") do io
                write(io, read(f, String))
            end
        end
        func(dir)
        w = ZipFile.Writer(outputfile)
        for (root, dirs, files) in walkdir(dir)
            for file in files
                rpath = relpath(joinpath(root, file), dir)
                zf = ZipFile.addfile(w, rpath; method = ZipFile.Deflate)
                write(zf, read(joinpath(root, file), String))
            end
        end
        close(w)
    end
    return
end

modify_docx(file, "output.docx") do dir
    docfile = joinpath(dir, "word/document.xml")
    xml = EzXML.readxml(docfile)
    tbl = only(findall("//w:tbl[contains(., 'A placeholder table')]", xml.root))
    prev = EzXML.prevnode(tbl)
    EzXML.unlink!(tbl)

    data = (; Age = randn(100) .* 100, Sex = rand(["male", "female"], 100), Weight = rand(100) .* 50 .+ 40)
    new_tbl = table_one(data, [:Age, :Sex, :Weight])
    new_tbl_xml = WriteDocx.to_xml(SummaryTables.to_docx(new_tbl), nothing)

    EzXML.linknext!(prev, new_tbl_xml)
    open(docfile, "w") do io
        EzXML.prettyprint(io, xml)
    end
end

The output document looks like this:

For some reason that I don’t immediately know each cell here has a paragraph margin, that’s why it’s too tall, but that is fixable.

6 Likes

Just to let you know, the link to PumasAI (presumably) is missing the URL.

1 Like

Thanks, copy paste error :slight_smile:

1 Like

Are there plans to support other languages than English in SummaryTables.jl? I understand that most authors will use the package in English but non-English speakers could want to use it in their own language.

For example, using table_one I would like that the header of the “Overall” column were “Global”, the translation of overall into Spanish. Is there an easy way to change that? Maybe it’s in the documentation but I couldn’t find it.

I wouldn’t say “plans” as in I’ve actually planned to add localization, but I want the package to be completely customizable so changing that label should also be made possible. One way would be to add a overall_label keyword to which you could pass "Global". But maybe it would make more sense to localize the whole package at once, by looking up the labels dynamically and the user can change how they are all looked up. For example you would also not want Mean in a Spanish table, but at some point it’s silly to keep adding x_label keywords. So one could use a new LocalizedString type internally instead and the user would pass a dictionary with all the translations. This could also maybe be done with ScopedValues.

You could open an issue for further discussion :slight_smile:

Currently your only option is taking the Table object, inspecting the cell array and replacing the Cell with the Overall label. That accesses internals though.

I’m happy if you’re aware of the situation.

I believe that the internationalisation could be done using GNU gettext and https://docs.juliahub.com/General/Gettext_jll/stable/. Some years ago I used it in a small Python program and I remember that it wasn’t a difficult task, but I have no idea if it would be the case in Julia.

But, maybe there are better/easier solutions.

Thanks for the link to gettext, wasn’t aware of that. Maybe it’s a bit overkill because I’m not going to be able to supply a bunch of language packs anyway, but it might give some design ideas.