Choosing a Plotting Base to Build a Statistical Plotting Package

If I wanted to build a package that adds more kinds of statistical plots to Julia, what would be the best plotting package to build it on top of? (e.g. Plots.jl, Makie, Gadfly, VegaLite) What advantages/disadvantages do they have? The most important things to me are being able to easily build new kinds of plots, as well as having users be able to customize them themselves.

(This wouldn’t really belong in StatsPlots or any kind of base plotting package, as the plots I’d like to add are fairly narrow in scope and meant to help with a specific kind of statistical analysis.)

1 Like

IANAPPD (I am not a plotting package developer), but here are my 2 cents. Without more details from you, I would guess best to start with Plots.jl and its recipe system because it is the most widely used AFAIK.

That being said, I think the right choice specifically depends on what types of plots you want to work on (the quick and dirty type? the good looking (publication quality) type? Rasterized or vectorized? with LaTeX? Interactive or static? Geo-referenced (like maps and stuff)? 2D or 3D? and so on and so forth…) So if you can be specific on the plot types you want to target, then plotting-package developers here will be able to guide you better.

1 Like

Some questions: Are you targeting the browser or print or …? How many points appear in your plots? How fast does it need to be?

2 Likes

In general I want good-looking, publication-quality graphics that can be printed. Speed is definitely very important, since some datasets might be very large. No geography/LaTeX are required; I don’t really care whether it outputs rasterized or vectorized (so long as I can choose the resolution for rasterized graphics). Static is good. 3d is a bonus but not very important. The thing I want is roughly some kind of equivalent of ggplot2 in R, which lets me pass objects to users that they can easily manipulate to get exactly what they want by customizing details.

The kinds of graphics I want to build are similar to the ones shown here..

I think you’re looking for something like Algebra of Graphics, which is being developed on top of Makie. @piever is the main developer, I believe and he can share more about the project.

4 Likes

StatsPlots.jl may already have a lot of what you’re looking for, so it could be a good base.

I haven’t looked too much at the newer plotting packages, but I normally reach for Gadfly.

3 Likes

If you target Pluto.jl as a frontend, you can use the same approach like in https://github.com/j-fu/PlutoVista.jl to wrap a JS plotting library.

It looks like it has almost all the features I’d like to use in terms of basic objects, but from the examples, it doesn’t seem to be very easy to extend. For instance, let’s say I wanted to put together a scatter plot of a 2-d multivariate normal normal, using the covariance ellipse shown at the bottom to show a 95% prediction interval and a rug plot to show the marginals. In ggplot2 pseudocode, this would look kind of like:

ggplot(data, aes(x, y)) + geom_point()  + geom_ellipse() + geom_rug()

I have no idea how I would get something this to work in StatsPlots. Gadfly seems easier to extend.

AlgebraOfGraphics looks great! The only problem is it seems pretty new and very small, with only one or two people working on it. I don’t know how feature-complete or stable it is, or whether I can rely on it to stick around for the long-term – I’d probably feel a lot more comfortable using it if it was part of Makie’s base.

Are you specifically looking to make a package for bayesian workflow plotting?

Well, just for your info, it’s as much part of Makie’s base as possible :wink:

2 Likes

You may want to join forces

https://discourse.julialang.org/t/jsoc-2021-student-introduction-improve-different-aspects-of-mcmcchains-jl/

2 Likes

A related question is, why do you need to make plots automatically anyways? Why not write functions that return objects that can be plotted, like DataFrames, and leave the plotting up to the user?

2 Likes

Hi! As @EvoArt suggested, this summer I’ll be working as a JSoC student to improve the plotting functionality of MCMCChains package from Turing ecosystem. My idea for the JSoC project is similar to yours. Based on some state of the art packages for Bayesian data analysis such as RStan or ggmcmc from R or PyMC3 and Arviz from Python, the idea is to add elements to the currently used plots in MCMCChains.jl and modify plots defaults in order to improve and facilitate results visualization, presentation, and interpretation. To accomplish this, I’ll modify/extend/create recipes for the MCMCChains package (which uses StatsPlots.jl). If you’d like, we can discuss further about this!

I checked the link you posted about ArviZ example plots, and as mentioned above, StatsPlots.jl it’s a good start. For what you ask here, maybe this, this and this can help.

4 Likes

FWIW I was initially put off plots/statsplots, as I don’t like the default look. But once you figure out how to customise stuff, that’s not an issue. Obviously, if you were building your own package on top, you could customise it to give the look you want.

Rather than geom_A +geom_B, plots uses A; B!. But seems like it wouldn’t be hard to make a macro to switch to the former style. Maybe I’m missing something though. I’m far from an expert.

Did you choose a base in the end?
Seems like a cool project.

also, see here: Rewriting MCMCChains with Makie.jl+AlgebraOfGraphics · Issue #306 · TuringLang/MCMCChains.jl · GitHub

1 Like

I haven’t started work on it since I’ve been preoccupied with other projects, but I do think Makie+AlgebraOfGraphics is probably the way to go.

Hi, I’ve been thinking about doing something similar too (a package of convenience plotting functions for statistics, e.g. plotting residuals for regression with one command), and I had similarly been thinking about using AlgebraOfGraphics (though I’m a little worried that it might not be stable enough).

It will probably be a while before I can start on this, but if you are interested in casually working on this together (e.g. a couple of hours every week or every other week), I’d be interested in doing that.

If you’re interested, I strongly recommend checking out the #Turing and #probprog channels in the Slack and asking about Turkie.jl, which is similar to what I had in mind.