Side by side violin plots with VegaLite.jl

Hi

I want to present some data to a colleague, whilst also showing off Julia. I have data for an effect that varies according to species and treatment. There are 2 treatments and I would like to display effect sizes as pairs of violin plots, a single pair for each species, coloured according to treatment, all in the same plotting area.

Additionally, I want to share the code that generated the plots, so it would be great to be able to produce something that looks nice and has an appropriate legend etc. with few lines of code.

The code below produces something decent enough (not “publication quality” which is obviously in the eye of the beholder anyway!). But the violins for each species, the 2 violins are stacked on top of eachother, making it difficult to interpret. Incidentally I’m getting a “Canvas Renderer is missing” error when I try to save the output, so I cant display the plot here.

using VegaLite,DataFrames

mydf = DataFrame([rand(["a","b","c"],1000),rand(["x1","x2"],1000),rand(1000)])
names!(mydf,[:species,:treatment,:value])
mydf |> @vlplot(
    mark={:area, orient="horizontal"},
    transform=[
        {density="value", groupby=["treatment","species"],
        as=[ "value", "density"]}
    ],
   
    y="value:q",
    x= {"density:q", impute=nothing,stack = "center", title=nothing,
        axis={ values=[], grid=false, ticks=true}},
    color = "treatment:n",
    width=70,
    spacing=0,
    column = :species,
    config={view={stroke=nothing}}
)

I’m not tied to using VegaLite. But I started using it recently, as it seemed to be the most straightforward package for creating faceted plots from dataframes. This is useful for me in data exploration, but also something my colleagues who all use R would feel at home with.

A promising alternative is using split violins in CairoMakie

using CairoMakie

xs1 = rand(["a", "b", "c"], 1000)
ys1 = randn(1000)
dodge1 = rand(1:2, 1000)

xs2 = rand(["a", "b", "c"], 1000)
ys2 = randn(1000)
dodge2 = rand(1:2, 1000)

fig = Figure()
ax = Axis(fig[1, 1])
violin!(ax, xs1, ys1, dodge = dodge1, side = :left, color = "orange")
violin!(ax, xs2, ys2, dodge = dodge2, side = :right, color = "teal")
fig

which produces


Looks nice, I think.

However, with my actual data, it looks like this


Seems off to me. Because of differences in variance, there are large differences in the total area of different halves of the same violin, despite having the same amount of data. Would be nice to have the area preserved i.e. the more spread out treatments are also shallower.

1 Like

I think for the VegaLite.jl case we’ll just have to wait until this is implemented…

1 Like

By coincidence, I was discussing this on slack today Slack.

Makie (and StatsPlots as well) rescales distributions so that they have the same “maximum density value”, but not the same area. This may be a bad idea, we should probably change to match the behavior of other plotting packages (they mostly seem to give the same area to all classes).

A quick violin plot for your input data example using StatsPlot.jl:

using DataFrames, StatsPlots

mydf = DataFrame([rand(["a","b","c"],1000),rand(["x1","x2"],1000),rand(1000)])
names!(mydf,[:species,:treatment,:value])

mydf_x1 = filter(row -> row.treatment == "x1", mydf)
mydf_x2 = filter(row -> row.treatment == "x2", mydf)
@df mydf_x1 violin(:species, :value, side=:left, linewidth=0, label="x1")
@df mydf_x2 violin!(:species, :value, side=:right, linewidth=0, label="x2")

Plots.plot!(legendfontsize=9, legendtitle="Treatment", legend=:outertopright,
            guidefontsize=10, xlabel="Species", ylabel="value")

violin_StatsPlot

Thank you. I think I’ll give up on violins for now, and use points with confidence interval error bars, like this example Error Bars & Error Bands · VegaLite.jl

I’m a bit confused by extent=:ci . Is there a way to state the percentage you want to use? I’m assuming the default is 95%?

Thanks for this! Very simple and looks nice. Unfortunately it doesn’t look great with my real data. I don’t think I’ll use violins in the end.

Could you provide a sample of your real data? Or a more realistic proxy.
It might be possible to normalize it prior to displaying.

Update: normalization issue will get fixed in Makie, but it may take a while for the PR to be reviewed / merged / tagged: support datalimits and respect area in violin by piever · Pull Request #730 · JuliaPlots/AbstractPlotting.jl (github.com)

(In particular, it seems that the Cairo backend is quite slow at drawing these violins, and it’d be nice to take advantage of this refactor to figure out if there is a more efficient way.)

Of course. What is the best way to share data?

You may use github or the free service from dropbox.com, sync.com, etc…

https://github.com/EvoArt/GrowthRates/raw/main/sample_dat.csv
Thanks for your help.

The documentation for the extent property is here. That seems to suggest that ci uses the standard confidence interval aggregation, described here, and that is the 5% and 95% values. I don’t see an option there to compute that for other values. So I think if you wanted to do this for some other bounds you would have to aggregate yourself before you feed the data into VegaLite.jl and then use the techniques described here to create the plot.

1 Like

@EvoArt, thanks for sharing the data.

Unfortunately could not find a proper violin solution at my user level of StatsPlots. The scaling options for the probability density functions may have to be built in the code.

With the current version of StatsPlots, the best violins for your data were obtained by encapsulating violin() commands, as in example above, inside plot() commands using tighter ylims:

violin_StatsPlot2

Thanks David, I’ve found this package really useful for exploring data.

Thanks for this. The difference in area doesn’t seem so offensive with the tighter ylims!

1 Like