Upcoming AlgebraOfGraphics release

This is a pre-announcement about an upcoming release of AlgebraOfGraphics:

logo_aog

AlgebraOfGraphics defines a language to make data visualizations by combining basic building blocks with * and +. It is based on Makie to actually draw the plots.

The code base had grown a bit out of hand and has mostly been rewritten to make sure that the library works robustly. It still will take a little bit of work before tagging the next release, but it’s probably a good time for users to try out the unreleased version (with ] add AlgebraOfGraphics#master) to give feedback. There are many planned features and it would be useful to get a sense of which are most needed.

To get a sense of how the next release will work, you can try out the brand new penguin tutorial :penguin: or read the philosophy section of the docs.

Demo

axis = (width = 225, height = 225)
penguin_bill = data(penguins) * mapping(
    :bill_length_mm => (t -> t / 10) => "bill length (cm)",
    :bill_depth_mm => (t -> t / 10) => "bill depth (cm)",
)
layers = linear() + mapping(col = :sex)
plt = penguin_bill * layers * mapping(color = :species)
draw(plt; axis)

51 Likes

This looks extremely promising! I like the composition design that encourages variables with useful names and reuses the DataFrames mapping syntax. Using distributivity for shared configuration seems just right. Also impressed by your theme… pleasant and very readable.

Small feedback on the API: are you sure about abbreviations such as “col” and “wts”? I think they hurt accessibility for new users, and readability in general, which outweighs the modest gain in keystrokes. Subjective matter of course :slight_smile:

11 Likes

I’m looking forward to using this.


I agree that abbreviations make reading harder. It can make writing harder too since I have to remember the word and its abbreviation.


A separate issue:

mapping(
    :bill_length_mm => (t -> t / 10) => "bill length (cm)",
    :bill_depth_mm => (t -> t / 10) => "bill depth (cm)",
)

This API seems to resemble DataFrames.transform(), but it’s not quite the same; in Dataframes we would have ByRow:

transform(
    :bill_length_mm => ByRow(t -> t / 10) => "bill length (cm)",
    :bill_depth_mm => ByRow(t -> t / 10) => "bill depth (cm)",
)

whereas in the proposal here the row-wise operation is implicit.

Explicit ByRow allows me to do column-wise operations by not using the ByRow. For example, I can take z-score transformation of the whole column, which I can’t do if I only have access to one element at a time because it needs to take the mean and standard deviation of the column. I can also use broadcasting to act on each element more concisely: => x -> x ./ 10 => .


I’m guessing

draw(plt; axis)

mutates its axis argument? If so, I’d expect it to be draw!(axis, plt) with the ! to indicate argument-mutation and the reversed argument order following the common convention that the container argument comes first.


The facet section Example gallery · Algebra of Graphics says " The “facet style” is only applied with an explicit call to facet! ." but the examples don’t show that.

5 Likes

Beautiful!

1 Like

Due to the dependence on Makie the following question is crucial: has it become any easier
to make Makie run on systems people use, including Windows?

Last time I tried I couldn’t get it to work.

1 Like

I had the same issue with not being able to run Makie on and old Windows laptop. After upgrading to a new laptop this year it ran OK.

I’m running all makie backends on a 4 year old dell xps13 and a desktop PC with a nvidia GPU, both with windows 10 and it all runs without issue.

The api looks convenient and easy to use, so I have the following questions.
Does/will AoG work in jupyter and pluto notebooks, and can it be saved as raster (png) and vector (pdf) plots? I could not find such examples in the linked docs.
Is interactivity considered to be in scope? E.g., something simple like selections in vegalite, for cross-panel highlights.

2 Likes

No axis is just a named tuple (see the first line) so this is equivalent to draw(plt, axis=(width=225, height=225). There is also a draw! function.

Yes using CairoMakie you can show plots in notebooks and save to PDF and PNG.

3 Likes

Thanks for the correction. Since Axis already has a meaning in Makie, I might use a different name for that.

Thanks for the feedback and nice words!

I think row / col is reasonable, as it is consistent with eachrow / eachcol in Base, nrow / ncol in DataFrames, and Row / Col in GridLayoutBase, which AlgebraOfGraphics uses for layouting. On wts on the other hand I completely agree with you. It came from GLM.lm, but it looks like they also plan on changing to weigths. I’ll change it before releasing.

I should probably add a FAQ section and add this to the FAQs. The operation is row-wise by design, and operations that take the whole column are not supported on purpose. My reasoning to prefer row-wise operations (other than performance) is that whole-column operations are error prone, especially when

  1. the data is grouped or
  2. different datasets are used.

The issue with 1. is that it is no longer completely clear whether you should apply the operation by group or globally.

To exemplify 2., in AlgebraOfGraphics you can do things like (data(df1) * visual(color = :red) + data(df2) * visual(color = :blue)) * mappings. If you were to eg zscore here, you would be normalizing differently and then plotting together, which could be questionable.

I think it’s best if the user does whole-column operations beforehand and simply adds the resulting column to the dataset.

Well spotted, that was a leftover from a previous version, I’ve fixed it.

I’m just using Makie syntax to pass attributes to the axis, see Plot Method Signatures · Makie Plotting Ecosystem (juliaplots.org). One way to think of it is that axis and figure are not axis and figure objects, but rather axis and figure settings.

I am on Windows and I haven’t experienced any issues. The CairoMakie backend (which I use for clean vector graphics) might well be the most robust, as it does not rely on the GPU.

See Backends & Output · Makie Plotting Ecosystem (juliaplots.org). You could probably also try WGLMakie in Pluto to have interactive plots in the browser (zooming and panning).

It’s not a high-priority item for me (I’d like to make sure the “bases” are completely solid first), but it’s definitely possible. It’ll be interesting to see what possibility Makie opens for this type of interactivity.

5 Likes

I think your reasoning about why to avoid column-wise operations makes sense, but I feel like the very similar syntax but different semantics from the DataFrames DSL will be confusing / error-prone. I wonder if ByRow could move to DataAPI and then AoG could just require it? Like one would write it as

:bill_length_mm => ByRow(t -> t / 10) => "bill length (cm)"

always even though there’s no column operations supported?

Also, something that I think is cool about the this Pair based syntax used in both packages is that you can save it as a variable and re-use it, like

bill_length_cm = :bill_length_mm => ByRow(t -> t / 10) => "bill length (cm)"

If the semantics were the same between AofG and DataFrames, you could actually reuse transformations between the packages as well! So you can have not only column-names as re-usable variables to programmatically generate things, but also transfromations.

by the way, the new release looks awesome! Love the penguin tutorial as well.

2 Likes

@piever There is an un-exported macro in DataFramesMeta for creating src => fun => dest syntax called @col. It’s unexported for now, for no real reason I guess. here.

It might be of interest to you for testing your implementation of src => fun => dest and might be a convenient macro in general if more packages copy that API.

This looks like a great package! Looking through the penguins tutorial, I was wondering why the continuous colormap (e.g. for the density plots) is different than the default in Makie (Viridis, I think)?

Viridis is definitely a good choice. AlgebraOfGraphics theme uses Batlow. See The misuse of colour in science communication (nature.com) for a nice read about color maps, or the first author’s website Colour maps (fabiocrameri.ch), which also has a few other nice options.

My main reason for not using the default Viridis is that AoG uses the same color palette both for heatmaps and scatter plot (coloring the “fill” of the markers, which in AoG have no stroke). While Viridis works well for heatmaps, I find that the “upper range” of Viridis (green and especially yellow) is a bit problematic for markers on a light background, see for example the scatters in this blog post.

Theming attributes for plots (if I understand correctly) requires a little bit of cleanup on the Makie side, but then it should be easy to make these choices fully customizable for users.

7 Likes

The pair syntax actually originated in JuliaDB, and there it is row-wise (JuliaDB tables are distributed, so there row-wise is a very natural choice). TableOperations.transform also is row-wise, I suspect for similar reasons (the whole columns are not available in general). But your point still stands, as I suspect many more users are familiar with DataFrames than with JuliaDB.

This definitely requires some thinking. I wonder whether DataFrames plans to have some other functions with “row-wise” semantics, eg map(df, :x => f => "f(x)"), or whether they’ll use the whole-column approach uniformly. Recycling “pairs” across packages would indeed be nice.

That looks pretty interesting! Esp. for automated naming (and automatically creating anonymous functions) a macro can be pretty powerful.

3 Likes

One difference is that DataFrames needs to support grouped operations, where you want to act on the whole column, otherwise the grouping would be meaningless. So DataFrames chose by column as the default to avoid introducing a whole other syntax for grouped operations.

Got it. I wasn’t familiar with Batlow, and couldn’t tell if it was perceptually uniform at first glance, so thanks for the reference!

2 Likes

Is there an equivalent of coord_flip? Maybe there’s an abstraction that works better in 3D.

That’s true, but on the other hand AlgebraOfGraphics has color which makes it ambiguous (I was actually confused by this initially, that’s what prompted me to suggest a change). I also think the API would feel more consistent without this single abbreviation but that’s a very minor point.

1 Like