Why are these violin plots "clipped"?

Hi all, I produced some violin plots from some data I have:

df_temp = DataFrame(mnemonic = df_model_master[:, :mnemonic],
    instruction = df_model_master[:, instruction_sym],
    shift = df_model_master[:, has_barrel_shift_sym],
    value = df_model_master[:, mean_power_sym] .* u"W"
)
selected_instructions = ["add", "uxtb", "uxth"]
df_filtered = filter(:mnemonic => m -> m ∈ selected_instructions, df_temp)
df_pos = filter(:shift => identity, df_filtered)
df_neg = filter(:shift => x -> !x, df_filtered)

p = violin(0;
)
p = @df df_neg violin!(:mnemonic, :value;
    # xlabel = "Instructions",
    ylabel = "Measured power",
    side = :left, label = "not active",
    left_margin = 5mm,
    bottom_margin = 3mm,
    size = (800, 500),
    tickfontsize = 14,
    guidefontsize = 16

)
p = @df df_pos violin!(:mnemonic, :value;
    # xlabel = "Instructions",
    ylabel = "Measured power",
    side = :right, label = "active",
    left_margin = 5mm,
    bottom_margin = 3mm,
    size = (800, 500),
    tickfontsize = 14,
    guidefontsize = 16

)

What I obtain is this:


As you see the violins are a bit… “clipped” at the ends (top and bottom)? Why is it so? And can I “smoothen” them? Thanks!

Violin plots are always clipped (I think to the range of the input data, or maybe plus some margin?), because the function that defines the curve is the sum of Gaussian distributions, which have full support.

The clipping is more obvious for data like yours that is not well approximated by a normal distribution.

Python does the same thing:

import matplotlib.pyplot as plt
plt.violinplot([[3, 3, 3, 3, 7, 7, 7, 7], [2, 2, 4, 4]])

violin

As your script doesn’t include the import statements, I can’t tell which plotting library you are using, but check the documentation for the violin! function to see if there is a scale parameter that will let you reduce the standard deviation of the Gaussians. That will create a less visible clipped appearance.

1 Like

With GMT they are not clipped.

https://www.generic-mapping-tools.org/GMTjl_doc/examples/plotting_functions/05_1_stats/#violin_plots

Sure they are:

using GMT
violin(rand((2, 3, 11), 10, 3), fill=true, show=true)

This is a fundamental property of violin plots.

I don’t think it’s really fundamental; e.g., with Makie you can choose. Translating the example above to Makie syntax,

using CairoMakie
violin([fill(1, 8); fill(2, 4)], [3, 3, 3, 3, 7, 7, 7, 7, 2, 2, 4, 4])

gives


while

violin([fill(1, 8); fill(2, 4)], [3, 3, 3, 3, 7, 7, 7, 7, 2, 2, 4, 4]; datalimits=extrema)

gives

4 Likes

This is pedantic, but even those are clipped, just at a place on the Gaussian curve where the function is already close to zero :slight_smile:

(And reasonable people may disagree whether those plots are a better representation of the underlying distribution than the more tightly clipped ones.)

2 Likes

You caught me with that one (would need to check why it’s doing it) but the examples that I linked above show no clipping. Or, as you said, clipp the density function very close to zero.

Thanks to all! So I guess the true underlying answer is that my data set is too little to appreciate a pleasure looking violin plot. No problem though, if it’s supposed to look this way.