[ANN] A Package for Generating Corner Plots (PairPlots.jl)

vancleve · February 9, 2024, 7:23pm

Impressive!

aplavin · February 9, 2024, 7:37pm

Thanks for the package, keeps getting better!

Would it make sense to make default bins more narrow? Both in the examples you demonstrate above, and in my recent experience, they seem too wide – unless there’s just a single bell-like curve covering the majority of the range.

Also, what do you think about labeling the first row as well? For consistency, and so that not to scroll to the bottom when there are many variables.

sefffal · February 10, 2024, 6:34pm

Thanks for the kind words!

@aplavin I’ll see about increasing the bin count heuristic a bit. Maybe 50% more bins for that data count size would be good.

I see your point about a label near the top. If it’s just a short variable name that can work well. But for longer labels like the ones I used in that screenshot, it wouldn’t be ideal.
Maybe the variable name could be placed above the credible interval text.

At some point we may want to export a dict with a few different themes.

I am also eagerly awaiting more docs on the new Makie declarative API. I would very much like to make a live updating corner plot during sampling!

aplavin · February 10, 2024, 9:51pm

Oh, probably I didn’t explain myself clear enough. I mean making the first row uniform by adding the first variable label on the left of the row. All other rows have their corresponding variable labeled on the left, aside from the first one. Don’t know if that’s intended or accidental omission.

As for duplicating column labels at the top, this does seem suboptimal for the default, I agree.

jar1 · February 10, 2024, 9:54pm

Here are papers with an “optimal bin size” heuristic.

https://scholarworks.utep.edu/cgi/viewcontent.cgi?article=2165&context=cs_techrep

https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/WandBinWidth.pdf

sefffal · February 10, 2024, 10:01pm

I think I’m understanding now, but that would mean putting the x axis variable name along the y axis. The proper unit for that axis, if it is labeled at all, is counts, no?

sefffal · February 10, 2024, 10:05pm

Here is a case study. In order, these are screenshots from corner.py, chainconsumer.py, and statsplots.jl.

By contrast, corrplot from StatsPlots seem to differ from pairplot from StatsPlots:

sefffal · March 15, 2024, 3:06pm

New feature: correlation values

Thanks to @ericphanson for contributing this feature!

One can now easily add an annotation to each subplot displaying the correlation between each two variables:

pairplot(df=>(
    PairPlots.single_series_default_viz...,
    PairPlots.Correlation()
))

The text position, number of digits displayed, font, etc. are all customizable. You can also swap out the cor function for any function accepting two variables and returning a number:

count_X_gt_Y(xs,ys) = count(xs .> ys)
pairplot(df=>(
    PairPlots.single_series_default_viz...,
    PairPlots.Calculation(count_X_gt_Y, digits=0)
))

sefffal · March 20, 2024, 3:19pm

New documentation page built with DocumenterVitepress is now live!

Motivation: I’m planning to publish a python version of the package sometime soon, and documenter vitepress allows one to write example code for multiple languages under different tabs.

sefffal · August 27, 2024, 5:02pm

New Release

This release contains two new features: improved layout support, and improved bin sizing customization.

You can now invert pair plots to use up space in the top right corner using topright=true and bottomleft=false.
You can now also pass specific bin counts or ranges for each series by providing bins=Dict(:colname => 10) or bins=Dict(:colname => -10:1:10).

FYI if you are using a custom histogram calculation function (e.g. @juliohm), you shouldn’t have to change anything out of the box. If however you want that function to support custom bin ranges (in addition to bin counts) you can detect this by seeing if the bins arguments are numbers or ranges.

Gallery:

marcobonici · August 28, 2024, 3:16pm

Thank you @sefffal !
Here is a plot I produced with the new version of PairPlots.jl

Playiing a bit with it I obtained this plot (which I think we can define publication quality!).

Thanks again for providing the feature I requested so quickly!

sefffal · December 2, 2024, 10:07pm

Upcoming Change: Effective sample size for counters and histogram bins

Hello all, unless I hear strong opinions otherwise, the next minor version of PairPlots will calculate the effective sample size of each variable to determine the number of bins, and kernel density estimate bandwidth.

This is only a change to the defaults, as you can already override all of these parameters if you wish.

This will make PairPlots robust when plotting MCMC chains with high autocorrelation etc. Here is an example before / after:

This should only make a noticeable difference if you are plotting with the default visualization parameters, and are looking at data with high autocorrelation vs time.

Let me know below if you have objections to changing this default.

simonsteiger · December 3, 2024, 6:08am

I’ve just discovered this package and it looks absolutely wonderful! Thank you for the great work!

I would usually write my own (much less pretty!) pair plots to diagnose MCMC chains, and highlight divergent transitions with a different color to help me reason about where the sampler might be struggling.

I looked at the docs page for plotting MCMC chains but divergent transitions aren’t mentioned there. Is there such an option? If not, I think it would be an awesome optional feature for plotting Chains objects.

sefffal · December 4, 2024, 1:45pm

Hello @simonsteiger ! I also find it useful to plot divergent transitions.

Unfortunately to the best of my knowledge that information isn’t recorded by eg Turing in the chains object, so there is no way to access that information unless one has used AdvancedHMC directly.

Maybe we should open an issue on Turing to request that the numerical error flag be included as in internal column on the chains.

sefffal · December 4, 2024, 4:15pm

My mistake @simonsteiger — Turing does include that information already.
Here is an example of how you can flag divergent transitions!

 pairplot(
           # All samples, as usual
           PairPlots.Series(chain, color=:black) => PairPlots.single_series_default_viz,
           # Divergent samples
           PairPlots.Series(
               chain[chain[:numerical_error][:] .> 0,:,:],
               color=:red
          )  => (
                   PairPlots.Scatter(markersize=10),
                   PairPlots.MarginStepHist()
         )
       )

sefffal · December 4, 2024, 4:16pm

Here is the result:

simonsteiger · December 4, 2024, 8:53pm

Super cool! Thanks a lot for providing the example code! I’ll make sure to use this in the future.

sefffal · December 17, 2024, 6:28pm

Breaking Release: 3.0.0

Requires Julia 1.10 (previously, 1.7), Makie 0.21.18+
Renames the alias PairPlots.Correlation() to PairPlots.PearsonCorrelation()
Adds PairPlots.MarginQuantileText()
Removes PairPlots.MarginConfidenceLimits(), in favour of PairPlots.MarginQuantileText() and PairPlots.MarginQuantileLines()

Improvements

This release fixes a styling issue I found rather vexing. This required a small breaking change to the text formatting api, so I included a few other breaking naming changes that had been requested.

We now use the new Makie rich text feature Makie.subsup to format the confidence/credible intervals text. Previously, we had to use hacky LaTeX formatting for this.

This (1) makes the font consistent with all the rest of the text in the plot, and (2) allows the font to be coloured when showing multiple series (which was previously not useful when the text always had to be black).

Before vs After: (look closely at the labels along the diagonal)

New: quantile text for multiple series

To allow customizing the text separately from the lines, I split PairPlots.MarginConfidenceInterval into two separate PairPlots.MarginQuantileText and PairPlots.MarginQuantileLines.

You can now make the text, bold, italic, coloured, etc as your hearts desire.

The naming of CredibleInterval vs ConfidenceInterval proved to be controversial, so I now simply adopted Quantile....

Support for Julia < 1.10 was dropped, since maintaining backwards compatibility for the @sprintf variable length formatting (@sprintf("%.*f", digits, value), introduced in Julia 1.10) was a chore. Folks on earlier Julia versions can of course continue to use the previous release without issue.

Topic		Replies	Views
Failing to plot correlograms in Julia: Makie vs. AoG vs StatsPlots New to Julia plotting , statsplots , makie , algebraofgraphics	30	1408	December 30, 2023
`Plots` or otherwise? Visualization question , plotting	29	2154	October 2, 2023
Beautiful Makie Gallery Package Announcements	97	13030	July 23, 2023
Tips to create beautiful, publication-quality plots General Usage plotting , plots	28	15534	December 26, 2019
Scatterplot with marginal histograms Visualization plotting	35	3995	May 5, 2020