[ANN] A Package for Generating Corner Plots (PairPlots.jl)

Impressive!

1 Like

Thanks for the package, keeps getting better!

Would it make sense to make default bins more narrow? Both in the examples you demonstrate above, and in my recent experience, they seem too wide ā€“ unless thereā€™s just a single bell-like curve covering the majority of the range.

Also, what do you think about labeling the first row as well? For consistency, and so that not to scroll to the bottom when there are many variables.

1 Like

Thanks for the kind words!

@aplavin Iā€™ll see about increasing the bin count heuristic a bit. Maybe 50% more bins for that data count size would be good.

I see your point about a label near the top. If itā€™s just a short variable name that can work well. But for longer labels like the ones I used in that screenshot, it wouldnā€™t be ideal.
Maybe the variable name could be placed above the credible interval text.

At some point we may want to export a dict with a few different themes.

I am also eagerly awaiting more docs on the new Makie declarative API. I would very much like to make a live updating corner plot during sampling!

1 Like

Oh, probably I didnā€™t explain myself clear enough. I mean making the first row uniform by adding the first variable label on the left of the row. All other rows have their corresponding variable labeled on the left, aside from the first one. Donā€™t know if thatā€™s intended or accidental omission.

As for duplicating column labels at the top, this does seem suboptimal for the default, I agree.

Here are papers with an ā€œoptimal bin sizeā€ heuristic.

https://scholarworks.utep.edu/cgi/viewcontent.cgi?article=2165&context=cs_techrep

https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/WandBinWidth.pdf

I think Iā€™m understanding now, but that would mean putting the x axis variable name along the y axis. The proper unit for that axis, if it is labeled at all, is counts, no?

Here is a case study. In order, these are screenshots from corner.py, chainconsumer.py, and statsplots.jl.

image

By contrast, corrplot from StatsPlots seem to differ from pairplot from StatsPlots:
image

New feature: correlation values

Thanks to @ericphanson for contributing this feature!

One can now easily add an annotation to each subplot displaying the correlation between each two variables:

pairplot(df=>(
    PairPlots.single_series_default_viz...,
    PairPlots.Correlation()
))

The text position, number of digits displayed, font, etc. are all customizable. You can also swap out the cor function for any function accepting two variables and returning a number:

count_X_gt_Y(xs,ys) = count(xs .> ys)
pairplot(df=>(
    PairPlots.single_series_default_viz...,
    PairPlots.Calculation(count_X_gt_Y, digits=0)
))

12 Likes

New documentation page built with DocumenterVitepress is now live!

Motivation: Iā€™m planning to publish a python version of the package sometime soon, and documenter vitepress allows one to write example code for multiple languages under different tabs.

13 Likes