[ANN] UpSetPlot.jl

Announcing UpSetPlot.jl

I announce UpSetPlot.jl, a (minimalist) Julia package for plotting UpSet plots, a data visualization method for showing set data with more than three intersecting sets. UpSet plots are now frequently used instead of Venn diagrams, especially in life sciences.

Features

  • Plots UpSet plots
  • Horizontal, vertical, opt. cumulative intersection size for each intersection degree.
  • Produces the lists of elements specific to each set intersections.

Installation

using Pkg
Pkg.add("UpSetPlot")

Example

julia> using UpSetPlot
julia> using DataFrames
julia> set_names = ["Set1", "Set2", "Set3", "Set4", "Set5", "Set6"]
julia> df1 = DataFrame(
    Set1 = ["ID01", "ID02", "ID03", "ID07", "ID08", "ID09", "ID10", "ID04", "ID05", "ID06"],
    Set2 = ["ID01", "ID02", "ID03", "ID04", "ID05", "ID11", "ID12", "ID13", "ID14", "ID15"],
    Set3 = ["ID01", "ID02", "ID03", "ID07", "ID13", missing, missing, missing, missing, missing],
    Set4 = ["ID16", "ID17", "ID18", "ID19", "ID14", "ID02", "ID03", missing, missing, missing],
    Set5 = fill(missing, 10),
    Set6 = ["ID01", "ID02", "ID03", "ID04", "ID05", "ID11", "ID12", "ID13", "ID16", missing],
    Col_x = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10"]
)
julia> df2 = DataFrame(
    id = ["ID01", "ID02", "ID03", "ID04", "ID05", "ID06", "ID07", "ID08", "ID09", "ID10", "ID11", "ID12", "ID13", "ID14", "ID15", "ID16", "ID17", "ID18", "ID19"],
    Set1 = [true, true, true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false],
    Set2 = [true, true, true, true, true, false, false, false, false, false, true, true, true, true, true, false, false, false, false],
    Set3 = [true, true, true, false, false, false, true, false, false, false, false, false, true, false, false, false, false, false, false],
    Set4 = [false, true, true, false, false, false, false, false, false, false, false, false, false, true, false, true, true, true, true],
    Set5 = fill(false, 19),
    Set6 = [true, true, true, true, true, false, false, false, false, false, true, true, true, false, false, true, false, false, false]
)
julia> fig1, lists1 = upset_plot(df1; set_names=set_names, intersection_lists=true);
julia> fig2, lists2 = upset_plot(df2; orientation=:h, cumul=true, intersection_lists=true);
julia> lists1 == lists2
true
julia> lists1["Set1"] == ["ID06", "ID08", "ID09", "ID10"]
true
julia> lists2["Set1_Set2_Set6"] == ["ID04", "ID05"]
true
julia> to_dataframe(lists1)

julia> display(fig1)

julia> display(fig2)

That is a very nice plot, thanks for sharing! :slightly_smiling_face: :juliaheartpulse_dark:

First feature request would be to sort the rows/columns by intersection size. That is convenient as explained in the Wikipedia page. Maybe an option sort=true that is enabled by default?

Also, I would try to rename the functions to make them more idiomatic for Julia users, e.g., without underscores. upsetplot would be preferred. The function to_dataframe is also suspicious for those of us programming in Julia for a long time.

As a final suggestion, consider making this a Makie.jl recipe instead, so that people can use the recipe with any backend, including interactive ones (e.g., GLMakie.jl):

For the record, I would strongly prefer upset_plot, and I don’t think it is any less idiomatic than upsetplot. As the The current state of function naming style thread shows, the mashedcase convention that is encouraged (but inconsistently applied) in core Julia is rather controversial across the wider Julia ecosystem, and certainly does not reflect a consensus (unlike snake_case)

Cross-posting as it is relevant here too:

That goes against the convention in most plotting libraries though, and that is not specific to Julia:

  • heatmap is not heat_map
  • scatterlines is not scatter_lines
  • biplot is not bi_plot
  • barplot is not bar_plot
  • streamplot is not stream_plot
  • etc.