# Show significance on boxplots

I want to annotate the statistical significance of the difference between a pair of boxplots on my figure, as one of `'n.s.', '*', '**', '***'`. An example image of what I want is below, the best analogue I can find of what I want is ggsignif in R: GitHub - const-ae/ggsignif: Easily add significance brackets to your ggplots.

Does a functionality like this exist in `StatsPlots.boxplot`?

If no functionality exists, I can easily get one of `'n.s.', '*', '**', '***'` using the p-value for a given hypothesis test in HypothesisTests. However Iâ€™m struggling with annotating the plot correctly - I could use `annotate!(x, y, text("*", :centre, 8))`, but Iâ€™m not sure how Iâ€™d know which x, y to pick to place the text correctly above the box? Anyone have any suggestions?

1 Like

Wouldnâ€™t a color with legend be more intuitive for people who never came across this â€śsymbol jargonâ€ť? I use boxplots for a while and never came across *** ns, etc on papers.

1 Like

Yeah colourâ€™s a really nice idea, Iâ€™ll look at that. Unfortunately this notation is the standard in my field (developmental biology), so I canâ€™t really dismiss it.

You can annotate StatsPlots boxplot as follows:

``````using StatsPlots, DataFrames
theme(:ggplot2)

# INPUT DATA:
X = ["setosa", "versicolor","virginica"]
Y = [rand(-3:3, 10) for _ in 1:length(X)]
X2 = [fill(x,length(y)) for (x,y) in zip(X,Y)]
df = DataFrame(X = X2, Y = Y)

# PLOT DATA:
p = @df df boxplot(:X, :Y, c=:black, fillcolor=:white, legend=false)

ymin, ymax = ylims(p)
dy = (ymax - ymin)/25
ymax += dy
xt = xticks(p[1])[1]
plot!(xt[2:3], [ymax,ymax], c=:black, ylims = (ymin, ymax + dy))
annotate!(mean(xt[2:3]), ymax + dy/2, text("***",  10))
``````

the result is:

2 Likes

Thanks, and how could I get a second bar like this one in red?

Your solutionâ€™s great for annotating a comparison with the highest box, but Iâ€™d like to be able to annotate these bars at a consistent height above each box. Can I access the y values for the whiskers in each box?

Would something like this be to your liking?

Not really, if boxes `virginica` and `setosa` are adjacent, Iâ€™d like the line the same height above the tallest whisker of the two, as the height above the whisker for versicolor.

Basically, for any pair of boxes `b1, b2`, I want the line to be at height `max{whiskers(b1), whiskers(b2)} + dy`, for some `dy` constant across the plot.

Take my last plot and annotate it by hand, please.

Here Iâ€™m assuming that `setosa` has an upper whisker >= `virginica`. In either case, the line is always `dy` units above the tallest whisker.

OK, now it is clear. So you do not care about the outlier points displayed beyond the whiskers on the boxplot.

1 Like

I donâ€™t, no

Annotating as you suggest can lead to situations like the one shown below:

If there is a web resource showing more examples it would be helpful.

This recent FEX contribution (Matlab) seems to cover many of those cases

In cases like this Iâ€™d re-order the x axis to prevent these things occurring. How did you alter your code to achieve this?

Joaquim, that is the Lamborghini of boxplots. Way beyond my skills and time.

The buit-in logic seems to annotate mostly at the top, never going across the data, which doesnâ€™t seem to be sorted:

I guess what Iâ€™m really asking for (to save you time) is if thereâ€™s a way to get the y values of the whiskers for each plot (excluding outliers)?

``````p = @df df boxplot(:X, :Y, c=:black, fillcolor=:white, legend=false)