[ANN] A Package for Generating Corner Plots (PairPlots.jl)

A Package for Generating Corner Plots (PairPlots.jl)

Corner plots are a useful way of visualizing high dimensional data by presenting a grid of 2D histograms comparing each pair of variables. This package aims to provide a flexible interface for generating these grids, but by default aims to reproduce as closely as possible the output of corner.py.

Usage:

using Plots, PairPlots

a = randn(1000)
b = randn(1000)
data = (;a, b)

corner(data)

This is built on RecipesBase rather than Plots directly, so it should be fairly lightweight. Currently only tested with the GR backend.

Here are a few example plots:

The default is to overplot the datapoints, 2D histograms, and contour plots, but these can be turned on or off. It’s easy to customize each series and even swap out the series types. For instance, the 1D histograms can be changed from :step to :line just by passing in a new seriestype via hist_kwargs. Or, see the README for an example of using 3D wireframes on the off-diagonals.

This package isn’t yet registered since I haven’t settled on a name. The obvious CornerPlots turns out to be taken by the existing CornerPlot.jl, a package with similar goals for Gadfly. Name suggestions welcome!

Edit: now called PairPlots.jl

47 Likes

That looks excellent stuff.

Suggestion: Scatterplot Matrices

Thanks that’s not bad! From that same link, there is also PairPlots. That would be pretty easy to remember.

3 Likes

Thank you @sefffal for the contribution. We have our own corner plot recipe in GeoStats.jl as you mentioned in your README and could migrate it to a separate package like yours if it is maintained and expanded with more options in the future. In our case, we correct the histogram using the spatial coordinates of the samples: Plotting · GeoStats.jl

Would you recipe allow custom estimators of the histograms in the future?

Hi, I would be happy to add that functionality. Currently I calculate the histograms using the fit function in StatsBase. If we added a way to control that fit, would that be sufficient? Or would it be better to directly accept a precalculated histogram for each plot?

It would be great if we could pass in a function to compute the histogram on the fly for each entry in the corner plot. We have a EmpiricalHistogram type in GeoStats.jl that stores a classical Histogram inside but has a spatial correction as I mentioned. If we could pass a function (or functor) with the parameters of estimation that would be enough. You can check our plot recipe to understand what is happening more or less. Please let me know if you have any questions.

1 Like

I think I’ve settled on PairPlots.jl for the package name, but now I am not sure about the main function name. Should I keep corner(...) or change to pairplot(...) for consistency? Once that is settled I will register the first version. Thanks, I appreciate the feedback!

3 Likes

I think the name conerplot is widely used so I wouldn’t change the name of the function. You could also add corrplot in the future like the one in StatsPlots…jl so your PaiPlots.jl package is comprehensive.

The suffix *plot in the function name is important. People will be sure that something will be displayed on the screen right away. corner doesn’t imply a visualization.

1 Like

Updates:
The package has been submitted to the General registry under the name PairPlots.jl and should hopefully be up in a few days.

Using PolygonOps.jl and Contour.jl, I now use the outermost contour to exclude points from the scatter instead of simply plotting them behind the histogram. This leads to around a 20x performance increase on large datasets and significantly smaller file sizes for the resulting figures (~7x in my experiments). The results also look much cleaner:

The contour lines themselves are still drawn by the plotting backend for the sake of user customization, but this may change in future. There is a chance that Contours.jl and the plotting backend disagree slightly on the contours if they use different algorithms, but I think this is really only a tiny effect.

Additionally, one can now directly pass MCMCChains.Chains objects to corner and everything will just work. The :iteration and :chain columns are filtered out.

Finally, at the request of @juliohm one can now provide a callback to customize how the 1D and 2D histograms are calculated.

Thanks to all who provided feedback! I hope this tool can grow and be useful to as many people as possible.

Next ideas on the roadmap:

  • Option to add a super-title to the plots
  • Option to supply your own additional plot to fill the top right hand corner of the figure.
  • Denser grid
  • Option to overplot a linear fit through each off-diagonal subplot
6 Likes

@sefffal thanks for the update. I reiterate the importance of renaming the generic name corner to cornerplot. The first name can mean all kinds of things, and is can certainly conflict with other packages doing geometric processing for example.

@juliohm I did see your comments. Corner.py and its corner function are ubiquitous in my field and it’s important to me to present a familiar interface and output.

But also, I don’t buy the argument that all plotting functions must end in *plot. Most of the functions in Plots.jl don’t: scatter, histogram, and many others

That part of the API is settled, but you can always import the module and access it as PairPlots.corner. That should be completely unambiguous.

1 Like

I don’t know what field you are in, but in most fields these plots are never referred to as “corner”. I am a statistician and if I do a poll at my department about the terms “corner” and “cornerplot” without context you can bet the results. There are dozens of packages in Python and Julia calling this function a more specific name such as seaborn.pairplot, and the packages you cited in your README like StatsPlots.cornerplot. It doesn’t make much sense to me to insist on the name corner but you are the owner of the package :man_shrugging:t4:

This argument is yours? Because I never said that. I meant cornerplot specifically.

On an unrelated issue, the fact that it depends on Plots.jl currently is a bummer because as I mentioned on GitHub no major project will be able to depend on it. The naming issue becomes secondary compared to the Plots.jl dependency. And as you also know, I was able to write a similar recipe in GeoStatsBase.cornerplot without this heavy dependency.

Anyways, just sharing my thoughts here, feel free to develop the package as you wish. It can still be useful as a package that people import and use directly in final scripts, similar to what we see in the Python ecosystem.

With respect to Plots, the initial version that will be registered only depends on RecipesBase. I am working on an improvement though that makes the layout much denser, packing more information into a tight space.

So far the best way I’ve found to do this is using inset subplots but this appears to require using Plots itself.

Does anyone know if it’s possible to specify subplot positions without the bbox function?
I am aware of course of grid layouts but they add a lot of padding between subplots, and in grid layouts the subplots adjust their size when adding e.g. titles.

I’m guessing my best bet will just be to gate the functionality of this package behind Requires.

A few updates since the last post on this thread:

  • Denser layout made by abusing inset_subplot

  • Option to highlight a 1D or 2D histogram in the top right corner

  • Experimental option to place an arbitrary subplot in the top right corner

  • Contours now drawn directly by Contours.jl instead of the plots backend

  • Visual improvements to contours that have sharp kinks or intercept with the edges of the histogram

  • Additional bug fixes and small visual fixes

Thanks, and I hope others find this package useful!

11 Likes

The package is quite nice already and I will deprecate our own recipe to start using PairPlots.jl. Thanks for the updates and improvements!

1 Like

Also, people may find the combination TableTransforms.jl + PairPlots.jl quite useful for designing statistical pipelines:

@sefffal, note that the last example in our README produces some warnings in PairPlots.jl due to column names with underscore suffix such as a_.

2 Likes

Small update: basic support for unicode column names

This package already allows you to customize the column labels (and optionally, units) with arbitrary LaTeX, but didn’t support columns with unicode symbols in their names.
As a simple work around, the package now attempts to convert basic unicode symbols back into LaTeX using the REPL completions list. This probably won’t work for e.g. subscripts, but you can still put e.g a_2 as the column name or override it manually.

Example:

# Generate some data to visualize
N = 100_000
α = [2randn(N÷2) .+ 6; randn(N÷2)]
β = [3randn(N÷2); 2randn(N÷2)]
Îł = randn(N)
δ = β .+ 0.6randn(N)
# Pass data in a format compatible with Tables.jl
# Here, simply a named tuple of vectors.
table = (;α, β, γ, δ)
PairPlots.corner(table)

N.B. Plots doesn’t support combining unicode with LaTeX directly, and we need to use LaTeX for the title formatting - hence the workaround.

6 Likes

Why not use Latexify.jl?

Hmm I didn’t consider that package - that might work, thanks!

Release Preview: PairPlots.jl rewrite based on Makie!

Motivated by the vastly improved time to first Makie-plot thanks to #47184, I’ve rewritten PairPlots to support Makie.

As a bonus, this enables multiple series, composable visualizations, much easier customization, and interactivity.
My main question, to any one who is using it at the moment, is would this change of plotting package be overly disruptive? I would change the major version of the package so the existing Plots.jl based code could continue to be used by adding an older version.

Here’s a peek of the default style:
pairplot(table)


It mostly looks the same as the previous version by default, except for using hexagonal binning and a smoothed kernel density estimate. The previous square histograms are still supported of course (Note also the rich text label thanks to Makie).

Multiple series—basically just pairplot(table1, table2). Note that they can have partially disjoint column names. Here column c isn’t in the second table.

Flexible layouts–Thanks to Makie’s awesome layout engine, pair plots can be easily nested within larger figures.

The biggest changes though are invisible. Getting all those little subplots to lay out correcting in Plots.jl was very hacky. By switching to Makie, the code is much cleaner and more future proof.

The code is available on the PairPlots.jl repo under the makie branch for now. Documentation to come.

32 Likes