I want to create a box plot of data in a DataFrame grouped by more than one variable (column value). Is there a compact high-level way to do this? The following does not work:
‘’'using DataFrames, StatsPlots
dfa = DataFrame(a = [0, 1, 0, 1, 0, 1, 0, 1],
b = [1, 1, 1, 1, 2, 2, 2, 2],
dat = randn(8))
@df dfa boxplot(:dat, group=([:a, :b]))‘’’
Not sure if this is what you need, but just in case:
using DataFrames, StatsPlots
dfa = DataFrame(a = [0, 1, 0, 1, 0, 1, 0, 1],
b = [1, 1, 1, 1, 2, 2, 2, 2],
dat = randn(8))
gdf = groupby(dfa, [:a, :b])
nt = NamedTuple.(keys(gdf))
plot(legend=:outertopright)
i = 1
for (k,v) in pairs(gdf)
@df v boxplot!(:dat, label="$(nt[i])")
i += 1
end
Plots.current()
2 Likes
AFAIC @df dfa boxplot(:dat, group=(:a, :b))
should work and it’s a big if it doesn’t
It does not work properly - although there are 4 categories (which is accurately reflected in the legend), it generates 8 bars (one for each data point):
Someone kindly gave an answer on GitHub - the first argument needs to be the labels of the groups:
@df df boxplot(string.(tuple.(:a, :b)), :dat, group=(:a, :b))
1 Like