[ANN] A Package for Generating Corner Plots (PairPlots.jl)

A Package for Generating Corner Plots (PairPlots.jl)

Corner plots are a useful way of visualizing high dimensional data by presenting a grid of 2D histograms comparing each pair of variables. This package aims to provide a flexible interface for generating these grids, but by default aims to reproduce as closely as possible the output of corner.py.

Usage:

using Plots, PairPlots

a = randn(1000)
b = randn(1000)
data = (;a, b)

corner(data)

This is built on RecipesBase rather than Plots directly, so it should be fairly lightweight. Currently only tested with the GR backend.

Here are a few example plots:

The default is to overplot the datapoints, 2D histograms, and contour plots, but these can be turned on or off. It’s easy to customize each series and even swap out the series types. For instance, the 1D histograms can be changed from :step to :line just by passing in a new seriestype via hist_kwargs. Or, see the README for an example of using 3D wireframes on the off-diagonals.

This package isn’t yet registered since I haven’t settled on a name. The obvious CornerPlots turns out to be taken by the existing CornerPlot.jl, a package with similar goals for Gadfly. Name suggestions welcome!

Edit: now called PairPlots.jl

35 Likes

That looks excellent stuff.

Suggestion: Scatterplot Matrices

Thanks that’s not bad! From that same link, there is also PairPlots. That would be pretty easy to remember.

3 Likes

Thank you @sefffal for the contribution. We have our own corner plot recipe in GeoStats.jl as you mentioned in your README and could migrate it to a separate package like yours if it is maintained and expanded with more options in the future. In our case, we correct the histogram using the spatial coordinates of the samples: Plotting · GeoStats.jl

Would you recipe allow custom estimators of the histograms in the future?

Hi, I would be happy to add that functionality. Currently I calculate the histograms using the fit function in StatsBase. If we added a way to control that fit, would that be sufficient? Or would it be better to directly accept a precalculated histogram for each plot?

It would be great if we could pass in a function to compute the histogram on the fly for each entry in the corner plot. We have a EmpiricalHistogram type in GeoStats.jl that stores a classical Histogram inside but has a spatial correction as I mentioned. If we could pass a function (or functor) with the parameters of estimation that would be enough. You can check our plot recipe to understand what is happening more or less. Please let me know if you have any questions.

1 Like

I think I’ve settled on PairPlots.jl for the package name, but now I am not sure about the main function name. Should I keep corner(...) or change to pairplot(...) for consistency? Once that is settled I will register the first version. Thanks, I appreciate the feedback!

3 Likes

I think the name conerplot is widely used so I wouldn’t change the name of the function. You could also add corrplot in the future like the one in StatsPlots…jl so your PaiPlots.jl package is comprehensive.

The suffix *plot in the function name is important. People will be sure that something will be displayed on the screen right away. corner doesn’t imply a visualization.

1 Like

Updates:
The package has been submitted to the General registry under the name PairPlots.jl and should hopefully be up in a few days.

Using PolygonOps.jl and Contour.jl, I now use the outermost contour to exclude points from the scatter instead of simply plotting them behind the histogram. This leads to around a 20x performance increase on large datasets and significantly smaller file sizes for the resulting figures (~7x in my experiments). The results also look much cleaner:

The contour lines themselves are still drawn by the plotting backend for the sake of user customization, but this may change in future. There is a chance that Contours.jl and the plotting backend disagree slightly on the contours if they use different algorithms, but I think this is really only a tiny effect.

Additionally, one can now directly pass MCMCChains.Chains objects to corner and everything will just work. The :iteration and :chain columns are filtered out.

Finally, at the request of @juliohm one can now provide a callback to customize how the 1D and 2D histograms are calculated.

Thanks to all who provided feedback! I hope this tool can grow and be useful to as many people as possible.

Next ideas on the roadmap:

  • Option to add a super-title to the plots
  • Option to supply your own additional plot to fill the top right hand corner of the figure.
  • Denser grid
  • Option to overplot a linear fit through each off-diagonal subplot
6 Likes

@sefffal thanks for the update. I reiterate the importance of renaming the generic name corner to cornerplot. The first name can mean all kinds of things, and is can certainly conflict with other packages doing geometric processing for example.

@juliohm I did see your comments. Corner.py and its corner function are ubiquitous in my field and it’s important to me to present a familiar interface and output.

But also, I don’t buy the argument that all plotting functions must end in *plot. Most of the functions in Plots.jl don’t: scatter, histogram, and many others

That part of the API is settled, but you can always import the module and access it as PairPlots.corner. That should be completely unambiguous.

I don’t know what field you are in, but in most fields these plots are never referred to as “corner”. I am a statistician and if I do a poll at my department about the terms “corner” and “cornerplot” without context you can bet the results. There are dozens of packages in Python and Julia calling this function a more specific name such as seaborn.pairplot, and the packages you cited in your README like StatsPlots.cornerplot. It doesn’t make much sense to me to insist on the name corner but you are the owner of the package :man_shrugging:t4:

This argument is yours? Because I never said that. I meant cornerplot specifically.

On an unrelated issue, the fact that it depends on Plots.jl currently is a bummer because as I mentioned on GitHub no major project will be able to depend on it. The naming issue becomes secondary compared to the Plots.jl dependency. And as you also know, I was able to write a similar recipe in GeoStatsBase.cornerplot without this heavy dependency.

Anyways, just sharing my thoughts here, feel free to develop the package as you wish. It can still be useful as a package that people import and use directly in final scripts, similar to what we see in the Python ecosystem.

With respect to Plots, the initial version that will be registered only depends on RecipesBase. I am working on an improvement though that makes the layout much denser, packing more information into a tight space.

So far the best way I’ve found to do this is using inset subplots but this appears to require using Plots itself.

Does anyone know if it’s possible to specify subplot positions without the bbox function?
I am aware of course of grid layouts but they add a lot of padding between subplots, and in grid layouts the subplots adjust their size when adding e.g. titles.

I’m guessing my best bet will just be to gate the functionality of this package behind Requires.