How to adjust the X-axis order of boxplot with StatsPlots.jl

using Plots, StatsPlots, DataFrames

df = DataFrame(
    id=repeat([3, 17, 21], inner=10),
    value=randn(30)
)

@df df boxplot(string.(:id), :value, fillalpha=0.75, linewidth=1)

The result is
image

I want the box to be 3, 17, 21 from left to right.

PS: I have also tried CategoricalArray for the id column, but nothing changed.

1 Like

Normally you can just pass id as the x values, but then the boxes will be in the correct numerical positions.
In this case here, the best is probably just to replace 3, 17 and 21 with 1, 2 and 3 change the xticks to show 3, 17 and 21 - or to pad the values (they are sorted alphabetically)

@df df boxplot(lpad.(string.(:id), 2), :value, fillalpha=0.75, linewidth=1)
1 Like

Thanks. I figured out a similar solution today (by replacing original id values with properly lexicographically ordered ones) but it seemed a little ugly and introduced unnecessary noise.

It is much better if there is some option like order to specify the order directly.

What syntax would you suggest for that?

For this type of problem, @sprintf is our best friend:

using Plots, StatsPlots, DataFrames, Printf

df = DataFrame(
    id=repeat([3, 17, 21], inner=10),
    value=randn(30)
)

str = [ @sprintf("%02i", x) for x in df.id ] 
@df df boxplot(str, :value, fillalpha=0.75, linewidth=1)

# or to keep similar syntax, one may define a sprintf02i() function:
sprintf02i(vecx) = [ @sprintf("%02i", x) for x in vecx ] 
@df df boxplot(sprintf02i(:id), :value, fillalpha=0.75, linewidth=1)

StatsPlot_Boxplot_lexicographic_order

1 Like

I personally prefer

lpad.(string.(:id), 2)

to

str = [ @sprintf("%02i", x) for x in df.id ] 

but that’s a question of taste I guess

2 Likes

@mkborregaard, sorry for having overlooked your answer, much simpler indeed.

1 Like

Hi, @mkborregaard, I have no knowledge of internal details. But how about

@df df boxplot(string.(:id), :value, fillalpha=0.75, linewidth=1, xorder=[3, 17, 21])

Or we may pass into a custom sort function to sort the specific column in a df.

@df df boxplot(string.(:id), :value, fillalpha=0.75, linewidth=1, 
                         xsort=sort)

The above two lpad and sprintf workarounds work well for this particular example. But what if I want the order to be 21 3 17? Of course, I can still find a workaround, but it does not look beautiful.

1 Like

I happened to work a bit with boxplots yesterday. It would be really nice if there was a recipe for Pair{String, AbstractArray} so you could make a boxplot from ["a"=>[1,5,9], "b"=>[7,7,19]]. I might just write one.

@Shuhua, the folllowing seems to work by left-right padding spaces, if elegance is not a concern:

using Plots, StatsPlots, DataFrames

df = DataFrame(
    id=repeat([3, 17, 21], inner=10),
    value=randn(30)
)
xorder = [21 3 17]
Nx = length(xorder)
str = fill("",length(df.id))
for (i,xi) in enumerate(xorder)
    j = findall(x->x==xi, df.id)
    si = " "^(Nx-i)
    @. str[j] = si * string(df.id[j]) * si
end

# solution by left-right space-padding strings
@df df boxplot(str, :value, fillalpha=0.75, linewidth=1)

StatsPlot_Boxplot_lexicographic_order3

1 Like