Awesome to see the new release! Looking forward to trying it out
Looks really nice! By the way, I almost added super-above-subscripts to rich
but then removed it again before merging because the possible line break semantics weren’t clear to me. But seeing this real world use case, maybe I should think about how to get it back in again. Because it would be nicer not having to mix latex and normal font styles just for that.
And another comment, with weak dependencies coming in Julia 1.9, you could factor the Makie support out into a submodule that only loads the code when Makie is loaded as well. Then users of Plots wouldn’t need to pay the latency increase.
@sefffal if you feel that the package will target Makie moving forward, you can consider moving it to MakieOrg for greater visibility:
That would be great! It would equally be really nice if there was a way to use the LaTeX functionality with the default font (unless, say, a glyph was not available). That way the LaTeXified axes labels could match too.
Great point. I am still debating whether to keep the Plots.jl functionality or not. All the bookkeeping to layout the subplots makes the code very messy.
To make the Plots.jl version work, I actually make all the plots “inset subplots” with absolute positioning, all inside a single large plot with hidden axes. Then I have to carefully manage the axis limits manually between all plots. Not so nice!
Release is merged, tagged, and pile of new docs are live here!
I also added a basic Legend feature that can help with multiple series.
Latex is a bit more complicated because glyphs like integral signs need to be adjusted so they look good with super and subscripts, and these relationships differ per font. Text layouting in Makie is much simpler and only concatenates letters left to right without kerning.
Announcing PairPlots.jl version 1.0.0!
In my opinion all the basic functionality of PairPlots is now present, so to avoid falling into the 0ver trap, I’m tagging this as the 1.0.0 release. Expect new major versions to arrive whenever we need to adjust the API.
New since the previous release:
- “Truth” lines and other vertical/horizontal line series.
- Automatic figure sizing (unless you are plotting into an existing layout e.g.
pairplot(fig[1,1])
- Separate options for step-histograms and filled-histograms
- Improved automatic legend creation
- Integration with MCMCChains via a new julia 1.9+ package extension. Just run
pairplot(chains)
.
Extensive guide and docs: Home · PairPlots.jl
Two series overplotted with truth lines:
MCMCChains support & figure auto-sizing
Thanks as always to the Makie maintainers who make this possible!
New release of PairPlots
PairPlots is now updated to 2.1.0, with the following new features:
Support for Unitful and DynamicQuantities
If you plot a table with columns containing either Unitful units or DynamicQuantity units, the units will automatically be pulled out into series labels.
If you would like more customization, you can as always pass the labels
keyword argument with your desired label formatting.
using Unitful # or DynamicQuantities
df = DataFrame((;a=randn(10000)*u"m",b=randn(10000)*u"m/s"))
pairplot(df)
Support for missing data
You can now plot tables that contain missing data in some rows.
To be conservative, any row with a missing value is removed from the figure (instead of only skipping some sub-plots).
Feedback welcome on this behaviour.
Whenever missing data is skipped, an annotation is added to the bottom of the plot.
If you don’t want the annotation, just drop the missing rows yourself eg. with DataFrames.dropmissing
before you pass the table in.
# Generate some random, sometimes missing, data.
df = DataFrame(randn(1000,3) .* rand.(Ref((missing, 1, 1, 1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))), :auto)
pairplot(df)
This works for multiple series as well:
pairplot(
PairPlots.Series(df1,label="one",color=(:darkblue,0.5)),
PairPlots.Series(df2,label="two",color=(:darkgreen,0.5))
)
Support for Makie 0.20
Self-explanatory
Thanks everyone, and keep those feature-requests and bug-reports coming!
-WT
New Feature: full grid of plots above and below the diagonal
Thanks @aplavin for the feature request! PairPlots.jl now supports displaying a full grid of plots if you pass the fullgrid=true
option:
N = 100000
α = [2randn(N÷2) .+ 6; randn(N÷2)]
β = [3randn(N÷2); 2randn(N÷2)]
γ = randn(N)
δ = β .+ 0.6randn(N)
df = (;α, β, γ, δ);
pairplot(df, fullgrid=true)
Adding a legend is currently not supported when combined with a full grid plot but will be added back in a future version.
While we’re at it, here’s a demo of just some of the ways you can customize the look of a pair plot:
using PairPlots, CairoMakie
N = 100000
θ = 8rand(N) .+ 0.1 .* randn.()
r = atan.(θ) .+ 0.1 .* randn.()
x = r.*cos.(θ)
y = r.*sin.(θ)
tbl = (;x,y,θ,r=r.^2)
fig = Figure(
size=(1000,500)
)
pairplot(
fig[1,1],
tbl => (
PairPlots.Hist(sigmas=[3],color=:black, linewidth=2),
PairPlots.Contour(sigmas=[1],color=:blue, linewidth=2),
PairPlots.Scatter(markersize=1 ,color=(:blue,1), filtersigma=3),
PairPlots.MarginDensity(color=:white),
PairPlots.MarginConfidenceLimits(color=:orange, linestyle=:dot, linewidth=5)
),
PairPlots.Truth(
(;x = 0, y=0),
color=:darkred
),
fullgrid=true,
labels = Dict(
:θ => "angle θ",
:r => Makie.rich("radius r", font=:bold, color=:red),
:x => L"\sum_i^N{x_i}",
:y => Makie.rich("position y", font=:italic)
),
bodyaxis=(;
aspect=1,
backgroundcolor=:lightgray
),
diagaxis=(;
backgroundcolor=:black
)
)
ax = Axis3(fig[1,2])
scatter!(ax,x,y,r,markersize=1)
hidedecorations!(ax)
Makie.Label(fig[0,:], "Super Title", fontsize=20)
fig
Future Roadmap
The next major feature I would like to land is support for the new Makie declarative API. It’s my understanding that adopting this API will make it so that PairPlots can be animated, or updated live while eg an MCMC run is sampling. I’ll aim to do this as soon as the new API is stabilized by the Makie team.
Another feature request is for categorical axes. A contribution in the form of a PR adding this feature would be well-received.
Thanks all, and keep the feature requests coming!
This package gets better and better! Super useful for multivariate data analysis!
Announcing a new minor release (that should be in the registry soon).
New Feature: Trend Lines
This release added support for displaying simple linear trend lines fit to each pair of variables, skipping any missing values.
See Failing to plot correlograms in Julia: Makie vs. AoG vs StatsPlots - #23 by sefffal for the discussion that prompted this addition.
pairplot(
table => (
# choose what kind of series you want in body and along diagonal
PairPlots.Scatter(),
PairPlots.MarginHist(),
# Add trend line
PairPlots.TrendLine(color=:red),
),
fullgrid=true
)
If the data is ill conditioned (such that calculating the line results in a singular exception, then no line is displayed.
I’m interested to expand on this in future. I can imagine wanting to see the formula, correlation coefficient, etc, as well as display arbitrarily complex models.
Perhaps this could be added via extension packages, eg for GLM.jl.
I’d be interested to hear what the community thinks would be a good approach.
Announcing a new minor release of PairPlots.
Improvement: Automatic choice of significant figures in titles
Starting with version 2.4.0, the number of decimal points in titles specifying the credible interval is now calculated automatically.
Previously if you plotted a variable with a very small range, eg. 0.0004 ± 0.001 the titles were always rounded to 0.00 ± 0.00. Not ideal! Thanks @astrozot on GitHub for the bug report.
If the variable range becomes smaller than ± 0.0001, we switch to using scientific notation automatically: e.g. (1 ± 0.1) X 10^-6 .
Improvement: Use ± (\pm) when possible
Credible ranges like 10^{+2.5}_{-2.5}
are now displayed as 10±2.5
.
See example below.
I consider this release non-breaking (not requiring a major version bump) because the change in plot formatting is quite minor. This is furthermore a feature release instead of a patch release because this formatter can now be specified as a function instead of just a format string.
Happy plotting!
Feedback Requested on Default Styles
Question 1
Up until now, the number of bins used for the default style was always 32. If I recall correctly, this is the default in corner.py
. I think it would be better to choose this number dynamically using something like Sturges’s formula.
Here is the before and after for 100,000 points, 10,000 points, 1000 points, and 100 points:
The net result is that most existing plots will look a little more blocky. Note that as always you can override the number of bins when constructing the plot, this is just the default (that I think most people use anyways).
Are users okay with this change? Please let me know if not, otherwise I will release this as a new minor version in the coming days.
Question 2
I am also wondering if we should not add a traditional histogram next to the smoothed kernel density estimates along the diagonals (for the default styles). The KDE looks really nice but can sometimes smooth over interesting structure in the data. Maybe it’s safer to do something like this:
I would be interested to hear your thoughts. I won’t merge this change for the time being.
I like the proposal of adding the histogram on top of the kde by default.
Thanks for the feedback. Do you think making this kind of style change is acceptable in a minor release, or would this be better as a major release?
I don’t really know whether default plot outputs should be considered part of SemVer or not.
I would personally consider it as a small improvement of defaults. People can always add a keyword option to be more explicit in their downstream scripts if they dislike defaults.
The changes described above are now released. Thanks for your inputs!
Announcing a new minor release of PairPlots
Improvement: address crowding of tick labels
In previous versions, tick labels could overlap with each other becoming unreadable. This happened whenever the formatted tick labels exceeded about 4 digits.
I experimented with trying to detect this situation, but in the end I found the code was quite brittle. We would need to guess where Makie would put the automatic ticks, format the value, and measure its length (or a similar heuristic).
Instead, I have simply rotated the x and y tick marks by 45 degrees. This gives them a lot more room without crowding.
Before:
After:
Another approach would be to rotate only the bottom ticks by 90 degrees. This looks a bit cleaner but I think it is harder to read.
This is really useful! Thanks for the update and hard work on this @sefffal !