Automating the Best Layout for Plotting Variables?

Hi folks!

I have been struggling with a problem on how to best plot variables that are not always guaranteed in data. Let me show an example of what I am talking about:

Suppose I have a dataframe that looks something like this:

10×4 DataFrame
 Row │ age_group  gender_concept_id  race_concept_id  count
     │ String?    Int32?             Int32?           Int64
─────┼──────────────────────────────────────────────────────
   1 │ 50 - 59                 8532             8527    403
   2 │ 40 - 49                 8532             8515     51
   3 │ 30 - 39                 8507             8516     20
   4 │ 50 - 59                 8507             8527    445
   5 │ 40 - 49                 8532             8516     59
   6 │ missing                 8532             8527    139
   7 │ 60 - 69                 8532             8516     43
   8 │ 50 - 59                 8532             8516     69
   9 │ 80 - 89                 8532             8527    102
  10 │ 30 - 39                 8507             8527    114

Then, I have a bespoke plotting function that will generate a plot that looks something like this which contains multiple subplots:

Then, suppose that I receive a similar dataframe that only has age group and count. I’d then want to subset the data like this with my function and generate one plot:

I have been exploring around how one might create a function mechanism using the idea of faceting that allows one to input a dataframe to a function built around something like AlgebraOfGraphics that seems to enable one the ability to loosely do this automatic plotting approach. But I am still not sure; I do realize that this problem is asking for a solution that does a bit of mind-reading but it feels like some aspects of it should be able to be automated.

Does anyone have any ideas on how to approach this and if AlgebraOfGraphics might be a good place to start looking? Happy to provide more details as it is a bit of a vague problem description.

Cheers!

~ tcp :deciduous_tree:

First of all it sounds like you need a mechanism to update the data of a plot or some derivative of the data.

You may be interested in Observables.jl (or alternatively Reactive.jl). This allows you to update a plot when data is updated.

Next I think you should consider Makie’s Observables interaction:

https://docs.makie.org/stable/explanations/nodes/

Finally, to do the final integration you can look at Makie’s recipes system.

https://docs.makie.org/stable/explanations/recipes/

It is hard to imagine that one can automate this in a good way. How would your code know which factor to set as colour, which to facet as row, column, etc.

I guess one could look at the number of unique values in each column?

Algebra of Graphics does a pretty good job at making it easy to go from DataFrame to faceted plot (not quite as good as ggplot, but getting there).

I think you can build up the mapping element by element. So you could have lines which are like :x in names(df) && (mapping *= mapping(row=:x)). Note: I haven’t tested this, but from the examples, it seems you can.

If you are willing to provide the algorithm the list of potential grouping variables, the solution that I use works basically as follows:

  1. User provides a vector of grouping variables. That could be length 1, 2, or 3. Of course, the algorithm could also take a vector of length 3 and just ignore variables that don’t exist in your DataFrame.
  2. The first variable becomes the color argument of the AoG mapping. I have a function color_mapping(colorVar) that returns mapping(color = ...) or just mapping() when there is grouping (a single series).
  3. The second variable becomes the layout mapping. There is again a function layout_mapping(layoutVar) which returns either mapping(layout = dims(1) => ...) or just mapping() if there are not subplots.
  4. The third variable handles sub-figures, if needed.

Now you can assemble your plot by chaining the (x,y) mapping * color mapping * layout mapping.

1 Like