Handle numerical values as categories in plots

I am trying to plot boxplots of values with respect to 4 categories represented as numbers. The four categories are powers of two: 16384, 32768, 65536, 131072.

Let’s use this MWE dataframe:

using DataFrames, StatsPlots

df= DataFrame(cat = [16384, 32768, 65536, 131072, 16384, 32768, 65536, 131072, 16384, 32768, 65536, 131072], val = rand(12))

There are two possibilities, first I can plot without any change
boxplot(df.cat, df.val)
But as you can see on this image the boxes are not evenly spaced but the categories are numbers

But if I transform into strings I get nice boxes, but then the values are sorted alphabetically and not numerically.
boxplot(string.(df.cat), df.val)

I can’t find a way to make order this correctly. Anyone knows ?

I think you are looking for the xdiscrete_values keyword, e.g.:

using DataFrames, StatsPlots
df= DataFrame(cat = [16384, 32768, 65536, 131072, 16384, 32768, 65536, 131072, 16384, 32768, 65536, 131072], val = rand(12))
boxplot(string.(df.cat), df.val, xdiscrete_values=string.(df.cat))
2 Likes

YES that’s exactly what I needed thank you ! There’s a lot of keywords and aliases in the package, it’s hard to find what you need in the docs sometimes.

Where did you find it by the way ? I don’t see it in the attributes section of the documentation.

Yes I can sympathize with that sentiment; some of the Plots.plot() arguments could be better documented.

The xdiscrete_values keyword argument is at the bottom of the gr() backend documentation page. While this page does not describe how each argument affects the plot, I use it as a starting point if I need to explore new functionality.

1 Like