Grid plot by factors with Dataframes and Plots



I have a Dataframe with two factors, an IV and a DV. I want to create a grid of plots based on the two factors using Plots.jl. Here is a minimum working example of the data format:

using DataFrames 
df = DataFrame()
df[:Factor1] = repeat(repeat(1:2,inner=5),inner=10)
df[:Factor2] = repeat(repeat(1:2,outer=5),inner=10)
df[:IV] = rand(100)
df[:DV] = rand(100)

I see there is a group argument, but that creates multiple plots within the same plot. Is there a clever way to create a grid of plots from this data?


If your factors go from 1 to M and 1 to N respectively, the simplest would be:

using StatPlots, DataFrames
@df df scatter(:IV, :DV, group = {:Factor1, :Factor2}, layout = (M, N))

For more complex situation I would have thought the following would work and be useful:

plot(layout = (M,N))
by(df, [:Factor1, :Factor2]) do dd
    @df dd scatter!(:IV, :DV, subplot = (:Factor1, :Factor2))

But fails for technical reasons. Somehow a SubDataFrame does not have the IterableTables implementation and our plotting macro can’t handle it, I’ll file an issue.




Do you have a recommendation for setting the sub-titles? For example, the first title would be “1 1”, the second “1 2” etc? As far as I can tell, it does not seem possible without creating the subplots in a loop, which does not work currently.


I’m actually not sure how one would go about that. The only thing I can think of is that you can pass the titles as a row array (say title = ["a" "b" "c" "d"] to pass it to all subplots), but you’d need to compute that array by hand.

If you do a lot of these plots with grouped data you may consider GroupedErrors which has a @set_attr macro to set attributes that depend on a value of the grouping variables (we should maybe add something similar to Plots). For example:

using Plots, GroupedErrors
@> df begin
       @splitby (_.Factor1, _.Factor2)
       @x _.IV
       @y _.DV
       @set_attr :title string(_[1], " ", _[2])
       @plot scatter(layout = (2,2), legend = false)


Thanks, I ended up doing something similar but perhaps not very elegant. I created a nested for loop and created sub-DataFrames and added the subplots iteratively. I used title = string(factor1Value," ",factor2Value) to set the title. Perhaps one way set the titles is to use something like title = (:factor1,:factor2). I guess it wouldn’t be a general solution but it would show the factor values in the title of each subplot.

I also encountered a different but not entirely off-topic issue. In actuality, each subplot has two lines, one for a model prediction and one for the data. The legend contains a separate reference for each line ( a total of 8), but I only want two generic references, one for the model and one for the data. As far as I can tell there is no way to modify the legend with a function such as legend!([model,data],[:red,:black]). Do you think it would be worth while to file an issue for a feature like this?


legend is used to set legend position (or legend existence), label is the attribute used to determine what appears on the legend labels.


Ok. I see. I think the same problem persists. The attached plot shows that it the legend contains duplicate entries. Without something like label!(), it does not appear there is a way to remove the duplicates.