I have a Dataframe with two factors, an IV and a DV. I want to create a grid of plots based on the two factors using Plots.jl. Here is a minimum working example of the data format:
If your factors go from 1 to M and 1 to N respectively, the simplest would be:
using StatPlots, DataFrames
@df df scatter(:IV, :DV, group = {:Factor1, :Factor2}, layout = (M, N))
For more complex situation I would have thought the following would work and be useful:
plot(layout = (M,N))
by(df, [:Factor1, :Factor2]) do dd
@df dd scatter!(:IV, :DV, subplot = (:Factor1, :Factor2))
end
But fails for technical reasons. Somehow a SubDataFrame does not have the IterableTables implementation and our plotting macro can’t handle it, I’ll file an issue.
Do you have a recommendation for setting the sub-titles? For example, the first title would be “1 1”, the second “1 2” etc? As far as I can tell, it does not seem possible without creating the subplots in a loop, which does not work currently.
I’m actually not sure how one would go about that. The only thing I can think of is that you can pass the titles as a row array (say title = ["a" "b" "c" "d"] to pass it to all subplots), but you’d need to compute that array by hand.
If you do a lot of these plots with grouped data you may consider GroupedErrors which has a @set_attr macro to set attributes that depend on a value of the grouping variables (we should maybe add something similar to Plots). For example:
using Plots, GroupedErrors
@> df begin
@splitby (_.Factor1, _.Factor2)
@x _.IV
@y _.DV
@set_attr :title string(_[1], " ", _[2])
@plot scatter(layout = (2,2), legend = false)
end
Thanks, I ended up doing something similar but perhaps not very elegant. I created a nested for loop and created sub-DataFrames and added the subplots iteratively. I used title = string(factor1Value," ",factor2Value) to set the title. Perhaps one way set the titles is to use something like title = (:factor1,:factor2). I guess it wouldn’t be a general solution but it would show the factor values in the title of each subplot.
I also encountered a different but not entirely off-topic issue. In actuality, each subplot has two lines, one for a model prediction and one for the data. The legend contains a separate reference for each line ( a total of 8), but I only want two generic references, one for the model and one for the data. As far as I can tell there is no way to modify the legend with a function such as legend!([model,data],[:red,:black]). Do you think it would be worth while to file an issue for a feature like this?
Ok. I see. I think the same problem persists. The attached plot shows that it the legend contains duplicate entries. Without something like label!(), it does not appear there is a way to remove the duplicates.