AlgebraOfGraphics defines a language to make data visualizations by combining basic building blocks with * and +. It is based on Makie to actually draw the plots.
The code base had grown a bit out of hand and has mostly been rewritten to make sure that the library works robustly. It still will take a little bit of work before tagging the next release, but it’s probably a good time for users to try out the unreleased version (with ] add AlgebraOfGraphics#master) to give feedback. There are many planned features and it would be useful to get a sense of which are most needed.
To get a sense of how the next release will work, you can try out the brand new penguin tutorial or read the philosophy section of the docs.
This looks extremely promising! I like the composition design that encourages variables with useful names and reuses the DataFrames mapping syntax. Using distributivity for shared configuration seems just right. Also impressed by your theme… pleasant and very readable.
Small feedback on the API: are you sure about abbreviations such as “col” and “wts”? I think they hurt accessibility for new users, and readability in general, which outweighs the modest gain in keystrokes. Subjective matter of course
whereas in the proposal here the row-wise operation is implicit.
Explicit ByRow allows me to do column-wise operations by not using the ByRow. For example, I can take z-score transformation of the whole column, which I can’t do if I only have access to one element at a time because it needs to take the mean and standard deviation of the column. I can also use broadcasting to act on each element more concisely: => x -> x ./ 10 => .
mutates its axis argument? If so, I’d expect it to be draw!(axis, plt) with the ! to indicate argument-mutation and the reversed argument order following the common convention that the container argument comes first.
The api looks convenient and easy to use, so I have the following questions.
Does/will AoG work in jupyter and pluto notebooks, and can it be saved as raster (png) and vector (pdf) plots? I could not find such examples in the linked docs.
Is interactivity considered to be in scope? E.g., something simple like selections in vegalite, for cross-panel highlights.
I think row / col is reasonable, as it is consistent with eachrow / eachcol in Base, nrow / ncol in DataFrames, and Row / Col in GridLayoutBase, which AlgebraOfGraphics uses for layouting. On wts on the other hand I completely agree with you. It came from GLM.lm, but it looks like they also plan on changing to weigths. I’ll change it before releasing.
I should probably add a FAQ section and add this to the FAQs. The operation is row-wise by design, and operations that take the whole column are not supported on purpose. My reasoning to prefer row-wise operations (other than performance) is that whole-column operations are error prone, especially when
the data is grouped or
different datasets are used.
The issue with 1. is that it is no longer completely clear whether you should apply the operation by group or globally.
To exemplify 2., in AlgebraOfGraphics you can do things like (data(df1) * visual(color = :red) + data(df2) * visual(color = :blue)) * mappings. If you were to eg zscore here, you would be normalizing differently and then plotting together, which could be questionable.
I think it’s best if the user does whole-column operations beforehand and simply adds the resulting column to the dataset.
Well spotted, that was a leftover from a previous version, I’ve fixed it.
It’s not a high-priority item for me (I’d like to make sure the “bases” are completely solid first), but it’s definitely possible. It’ll be interesting to see what possibility Makie opens for this type of interactivity.
I think your reasoning about why to avoid column-wise operations makes sense, but I feel like the very similar syntax but different semantics from the DataFrames DSL will be confusing / error-prone. I wonder if ByRow could move to DataAPI and then AoG could just require it? Like one would write it as
If the semantics were the same between AofG and DataFrames, you could actually reuse transformations between the packages as well! So you can have not only column-names as re-usable variables to programmatically generate things, but also transfromations.
by the way, the new release looks awesome! Love the penguin tutorial as well.
This looks like a great package! Looking through the penguins tutorial, I was wondering why the continuous colormap (e.g. for the density plots) is different than the default in Makie (Viridis, I think)?
My main reason for not using the default Viridis is that AoG uses the same color palette both for heatmaps and scatter plot (coloring the “fill” of the markers, which in AoG have no stroke). While Viridis works well for heatmaps, I find that the “upper range” of Viridis (green and especially yellow) is a bit problematic for markers on a light background, see for example the scatters in this blog post.
Theming attributes for plots (if I understand correctly) requires a little bit of cleanup on the Makie side, but then it should be easy to make these choices fully customizable for users.
The pair syntax actually originated in JuliaDB, and there it is row-wise (JuliaDB tables are distributed, so there row-wise is a very natural choice). TableOperations.transform also is row-wise, I suspect for similar reasons (the whole columns are not available in general). But your point still stands, as I suspect many more users are familiar with DataFrames than with JuliaDB.
This definitely requires some thinking. I wonder whether DataFrames plans to have some other functions with “row-wise” semantics, eg map(df, :x => f => "f(x)"), or whether they’ll use the whole-column approach uniformly. Recycling “pairs” across packages would indeed be nice.
That looks pretty interesting! Esp. for automated naming (and automatically creating anonymous functions) a macro can be pretty powerful.
One difference is that DataFrames needs to support grouped operations, where you want to act on the whole column, otherwise the grouping would be meaningless. So DataFrames chose by column as the default to avoid introducing a whole other syntax for grouped operations.
That’s true, but on the other hand AlgebraOfGraphics has color which makes it ambiguous (I was actually confused by this initially, that’s what prompted me to suggest a change). I also think the API would feel more consistent without this single abbreviation but that’s a very minor point.