[ANN] A Package for Generating Corner Plots (PairPlots.jl)

juliohm · December 27, 2022, 8:35pm

Awesome to see the new release! Looking forward to trying it out

jules · December 28, 2022, 6:55am

Looks really nice! By the way, I almost added super-above-subscripts to rich but then removed it again before merging because the possible line break semantics weren’t clear to me. But seeing this real world use case, maybe I should think about how to get it back in again. Because it would be nicer not having to mix latex and normal font styles just for that.

jules · December 28, 2022, 9:08am

And another comment, with weak dependencies coming in Julia 1.9, you could factor the Makie support out into a submodule that only loads the code when Makie is loaded as well. Then users of Plots wouldn’t need to pay the latency increase.

juliohm · December 28, 2022, 10:54am

@sefffal if you feel that the package will target Makie moving forward, you can consider moving it to MakieOrg for greater visibility:

sefffal · December 28, 2022, 2:40pm

That would be great! It would equally be really nice if there was a way to use the LaTeX functionality with the default font (unless, say, a glyph was not available). That way the LaTeXified axes labels could match too.

Great point. I am still debating whether to keep the Plots.jl functionality or not. All the bookkeeping to layout the subplots makes the code very messy.
To make the Plots.jl version work, I actually make all the plots “inset subplots” with absolute positioning, all inside a single large plot with hidden axes. Then I have to carefully manage the axis limits manually between all plots. Not so nice!

sefffal · December 28, 2022, 9:41pm

Release is merged, tagged, and pile of new docs are live here!

I also added a basic Legend feature that can help with multiple series.

jules · December 29, 2022, 10:20am

Latex is a bit more complicated because glyphs like integral signs need to be adjusted so they look good with super and subscripts, and these relationships differ per font. Text layouting in Makie is much simpler and only concatenates letters left to right without kerning.

sefffal · February 27, 2023, 4:33pm

Announcing PairPlots.jl version 1.0.0!

In my opinion all the basic functionality of PairPlots is now present, so to avoid falling into the 0ver trap, I’m tagging this as the 1.0.0 release. Expect new major versions to arrive whenever we need to adjust the API.

New since the previous release:

“Truth” lines and other vertical/horizontal line series.
Automatic figure sizing (unless you are plotting into an existing layout e.g.pairplot(fig[1,1])
Separate options for step-histograms and filled-histograms
Improved automatic legend creation
Integration with MCMCChains via a new julia 1.9+ package extension. Just run pairplot(chains).

Extensive guide and docs: Home · PairPlots.jl

Two series overplotted with truth lines:

MCMCChains support & figure auto-sizing

Thanks as always to the Makie maintainers who make this possible!

sefffal · November 28, 2023, 8:05pm

New release of PairPlots

PairPlots is now updated to 2.1.0, with the following new features:

Support for Unitful and DynamicQuantities

If you plot a table with columns containing either Unitful units or DynamicQuantity units, the units will automatically be pulled out into series labels.

If you would like more customization, you can as always pass the labels keyword argument with your desired label formatting.

using Unitful # or DynamicQuantities
df = DataFrame((;a=randn(10000)*u"m",b=randn(10000)*u"m/s"))
pairplot(df)

Support for missing data

You can now plot tables that contain missing data in some rows.
To be conservative, any row with a missing value is removed from the figure (instead of only skipping some sub-plots).
Feedback welcome on this behaviour.

Whenever missing data is skipped, an annotation is added to the bottom of the plot.
If you don’t want the annotation, just drop the missing rows yourself eg. with DataFrames.dropmissing before you pass the table in.

# Generate some random, sometimes missing, data.
df = DataFrame(randn(1000,3) .* rand.(Ref((missing, 1, 1, 1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))), :auto)
pairplot(df)

This works for multiple series as well:

pairplot(
    PairPlots.Series(df1,label="one",color=(:darkblue,0.5)),
    PairPlots.Series(df2,label="two",color=(:darkgreen,0.5))
)

Support for Makie 0.20

Self-explanatory

Thanks everyone, and keep those feature-requests and bug-reports coming!

-WT

sefffal · December 18, 2023, 4:09pm

New Feature: full grid of plots above and below the diagonal

Thanks @aplavin for the feature request! PairPlots.jl now supports displaying a full grid of plots if you pass the fullgrid=true option:

N = 100000
α = [2randn(N÷2) .+ 6; randn(N÷2)]
β = [3randn(N÷2); 2randn(N÷2)]
γ = randn(N)
δ = β .+ 0.6randn(N)
df = (;α, β, γ, δ);

pairplot(df, fullgrid=true)

Adding a legend is currently not supported when combined with a full grid plot but will be added back in a future version.

While we’re at it, here’s a demo of just some of the ways you can customize the look of a pair plot:


using PairPlots, CairoMakie

N = 100000
θ = 8rand(N) .+ 0.1 .* randn.()
r = atan.(θ) .+ 0.1 .* randn.()
x = r.*cos.(θ) 
y = r.*sin.(θ) 
tbl = (;x,y,θ,r=r.^2)


fig = Figure(
    size=(1000,500)
)
pairplot(
    fig[1,1],
    tbl => (
        PairPlots.Hist(sigmas=[3],color=:black, linewidth=2),
        PairPlots.Contour(sigmas=[1],color=:blue, linewidth=2),
        PairPlots.Scatter(markersize=1 ,color=(:blue,1), filtersigma=3),
        PairPlots.MarginDensity(color=:white),
        PairPlots.MarginConfidenceLimits(color=:orange, linestyle=:dot, linewidth=5)
    ),
    PairPlots.Truth(
        (;x = 0, y=0),
        color=:darkred    
    ),
    fullgrid=true,
    labels = Dict(
        :θ => "angle θ",
        :r => Makie.rich("radius r", font=:bold, color=:red),
        :x => L"\sum_i^N{x_i}",
        :y => Makie.rich("position y", font=:italic)
    ),
    bodyaxis=(;
        aspect=1,
        backgroundcolor=:lightgray
    ),
    diagaxis=(;
        backgroundcolor=:black
    )
)

ax = Axis3(fig[1,2])
scatter!(ax,x,y,r,markersize=1)
hidedecorations!(ax)
Makie.Label(fig[0,:], "Super Title", fontsize=20)
fig

Future Roadmap

The next major feature I would like to land is support for the new Makie declarative API. It’s my understanding that adopting this API will make it so that PairPlots can be animated, or updated live while eg an MCMC run is sampling. I’ll aim to do this as soon as the new API is stabilized by the Makie team.

Another feature request is for categorical axes. A contribution in the form of a PR adding this feature would be well-received.

Thanks all, and keep the feature requests coming!

juliohm · December 18, 2023, 4:23pm

This package gets better and better! Super useful for multivariate data analysis!

sefffal · December 28, 2023, 10:22pm

Announcing a new minor release (that should be in the registry soon).

New Feature: Trend Lines

This release added support for displaying simple linear trend lines fit to each pair of variables, skipping any missing values.

See Failing to plot correlograms in Julia: Makie vs. AoG vs StatsPlots - #23 by sefffal for the discussion that prompted this addition.

pairplot(
    table => (
        # choose what kind of series you want in body and along diagonal
        PairPlots.Scatter(),
        PairPlots.MarginHist(),
        # Add trend line
        PairPlots.TrendLine(color=:red),
    ),
    fullgrid=true
)

If the data is ill conditioned (such that calculating the line results in a singular exception, then no line is displayed.

I’m interested to expand on this in future. I can imagine wanting to see the formula, correlation coefficient, etc, as well as display arbitrarily complex models.

Perhaps this could be added via extension packages, eg for GLM.jl.

I’d be interested to hear what the community thinks would be a good approach.

sefffal · January 29, 2024, 6:13pm

Announcing a new minor release of PairPlots.

Improvement: Automatic choice of significant figures in titles

Starting with version 2.4.0, the number of decimal points in titles specifying the credible interval is now calculated automatically.

Previously if you plotted a variable with a very small range, eg. 0.0004 ± 0.001 the titles were always rounded to 0.00 ± 0.00. Not ideal! Thanks @astrozot on GitHub for the bug report.

If the variable range becomes smaller than ± 0.0001, we switch to using scientific notation automatically: e.g. (1 ± 0.1) X 10^-6 .

Improvement: Use ± (\pm) when possible

Credible ranges like 10^{+2.5}_{-2.5} are now displayed as 10±2.5.

See example below.

I consider this release non-breaking (not requiring a major version bump) because the change in plot formatting is quite minor. This is furthermore a feature release instead of a patch release because this formatter can now be specified as a function instead of just a format string.

Happy plotting!

sefffal · February 5, 2024, 4:09pm

Feedback Requested on Default Styles

Question 1

Up until now, the number of bins used for the default style was always 32. If I recall correctly, this is the default in corner.py. I think it would be better to choose this number dynamically using something like Sturges’s formula.

Here is the before and after for 100,000 points, 10,000 points, 1000 points, and 100 points:

The net result is that most existing plots will look a little more blocky. Note that as always you can override the number of bins when constructing the plot, this is just the default (that I think most people use anyways).

Are users okay with this change? Please let me know if not, otherwise I will release this as a new minor version in the coming days.

Question 2

I am also wondering if we should not add a traditional histogram next to the smoothed kernel density estimates along the diagonals (for the default styles). The KDE looks really nice but can sometimes smooth over interesting structure in the data. Maybe it’s safer to do something like this:

I would be interested to hear your thoughts. I won’t merge this change for the time being.

juliohm · February 5, 2024, 10:04pm

I like the proposal of adding the histogram on top of the kde by default.

sefffal · February 5, 2024, 10:23pm

Thanks for the feedback. Do you think making this kind of style change is acceptable in a minor release, or would this be better as a major release?

I don’t really know whether default plot outputs should be considered part of SemVer or not.

juliohm · February 5, 2024, 11:22pm

I would personally consider it as a small improvement of defaults. People can always add a keyword option to be more explicit in their downstream scripts if they dislike defaults.

sefffal · February 6, 2024, 12:55am

The changes described above are now released. Thanks for your inputs!

sefffal · February 9, 2024, 5:55pm

Announcing a new minor release of PairPlots

Improvement: address crowding of tick labels

In previous versions, tick labels could overlap with each other becoming unreadable. This happened whenever the formatted tick labels exceeded about 4 digits.

I experimented with trying to detect this situation, but in the end I found the code was quite brittle. We would need to guess where Makie would put the automatic ticks, format the value, and measure its length (or a similar heuristic).

Instead, I have simply rotated the x and y tick marks by 45 degrees. This gives them a lot more room without crowding.

Before:

After:

Another approach would be to rotate only the bottom ticks by 90 degrees. This looks a bit cleaner but I think it is harder to read.

juliohm · February 9, 2024, 6:17pm

This is really useful! Thanks for the update and hard work on this @sefffal !

Topic		Replies	Views
Failing to plot correlograms in Julia: Makie vs. AoG vs StatsPlots New to Julia plotting , statsplots , makie , algebraofgraphics	30	1408	December 30, 2023
`Plots` or otherwise? Visualization question , plotting	29	2154	October 2, 2023
Beautiful Makie Gallery Package Announcements	97	13030	July 23, 2023
Tips to create beautiful, publication-quality plots General Usage plotting , plots	28	15534	December 26, 2019
Scatterplot with marginal histograms Visualization plotting	35	3995	May 5, 2020