Violin plot

miguelborrero · April 24, 2025, 5:23pm

Hi there,

It seems like violin plots truncate the support of the data. My understanding is that violin plots are kernel density estimates of the data but I’m struggling to find more information about the actual details.

Thanks in advance!
Miguel.

rafael.guerra · April 24, 2025, 8:01pm

Looking at the code, I think that by default the violin plots in StatsPlots.jl don’t go beyond the minimum and maximum values in the dataset. And that to change the default violin() plot we should set the keyword argument trim=false.

miguelborrero · April 24, 2025, 8:32pm

Thanks for the answer! I will look closer into the code. The problem in my case is that its truncating the lower support above the minimum. It might be an issue with outliers.

rafael.guerra · April 24, 2025, 8:43pm

In a quick test comparison with boxplot I could not see any problem as the (default) violin plot extended up to the outliers.

miguelborrero · April 24, 2025, 9:07pm

Yep, it might only be failing in my case where the data is kind of not too well behaved since I have some very large outliers. I can not provide a MWE bu just to illustrate:

df |>
    x -> @df x boxplot(:time_period, :max_down_speed, legend = false, title = "CAF II model", showaxis = :y, outliers = false)
    df |>
    x -> @df x violin!(:time_period, :max_down_speed)

Gives:

Benny · April 24, 2025, 11:35pm

The 4 violin plots on the right not overlapping much with the boxplots does seem strange. Check the ECDFs to see if either the violin or box plots are off.

miguelborrero · April 25, 2025, 12:50am

I know for a fact that my data contains zeros for every category and this is clearly not reflected in the violin plots which I think are off.

Benny · April 25, 2025, 1:58am

The violin plots could be very thin instead of truncated if the point density just sharply drops off there, see how the middle 4 have these smaller diamonds above the main violin. However, that seems to be contradicted by the boxplot’s quartiles, so I suggest checking the ECDFs to eyeball the density and quartiles. ECDFs don’t look good, but a simple and transparent line plot of sort(x), (1:length(x)) ./ length(x) dodges the options, cutoffs, and possible implementation issues of the boxplot or violinplot. You should be able to tell which is wrong, if either.

Liozou · April 25, 2025, 11:27am

To complement the answer above, you could also try with Makie to compare?

using GLMakie

function plot_stats_line!(ax, data, xpos)
    xs = fill(xpos, length(data))
    violin!(ax, xs, data)
    boxplot!(ax, xs, data; color=Makie.wong_colors(0.7)[2])
end

function plot_stats(data) # plot the distribution of vector `data`
    f = Figure()
    ax = Axis(f[1,1])
    plot_stats_line!(ax, data, 1) # call this with different values of xpos to put multiple lines on the same plot
    f
end

If you take something with large outliers like

data = randn(2000)
append!(data, 30*rand() for _ in 1:10);

then calling plot_stats(data) will show:

where it may look like the violin truncates the support, but if you zoom in you will see that each outlier is actually covered:

Of course if there is a bug in StatsPlot it should be reported there regardless.

Topic		Replies	Views
Why are these violin plots "clipped"? Visualization plotting , plots , statsplots	7	1303	August 21, 2023
Violin plot interpolates too much Visualization plotting	9	2229	May 18, 2017
Side by side violin plots with VegaLite.jl Visualization	15	1571	May 5, 2021
How to scale the density of violin plot General Usage question , plotting , statsplots	2	383	October 9, 2023
Makie: Make only part of a violin plot transparent with correct width Visualization question , makie	2	264	May 3, 2023

Violin plot

Related topics