I would like to use PlotlyJS.jl to generate a composite figure with a violin and scatterplot, where the scatterplot is overlayed on top of the violin and is actually the mean ± 95% confidence interval of the mean. My current attempt works with two groups, but only if the x-axis data are integers starting from 1. If the x-axis data is any other format (including strings), the violins and scatter are no longer aligned.
I thought to reach out to the community in the event someone has attempted this before. My code is the following:
#' # imports
using DataFrames, Statistics, HypothesisTests, PlotlyJS, ORCA
#' # build dataframe
a = rand(80);
label1 = repeat(1:1:4, inner=20, outer=1);
#label1 = repeat(0:1:3, inner=20, outer=1);
#label1 = repeat(["x1", "x2", "x3", "x4"], inner=20, outer=1);
label2 = repeat(["a", "b"], inner=1, outer=40);
df = DataFrame(a=a, label1=label1, label2=label2);
#' # plotlyjs.jl ploting function, composite violin and 95% mean confidence interval
function violintwogroups(;df::DataFrame, x::Symbol, y::Symbol, legend::Symbol, xtitle::String, ytitle::String, figurename="newfigure"::String, width=1000::Int64, height=600::Int64)
firstgroupname = unique(df[:, legend])[1];
secondgroupname = unique(df[:, legend])[2];
df1 = df[df[!, legend] .== firstgroupname, :];
df2 = df[df[!, legend] .== secondgroupname, :];
xcounts = length(unique(df[:, x]));
v1 = violin(;x=df1[:, x], y=df1[:, y], color=:blue, name=firstgroupname, points=:all, meanline_visible=true, pointpos=0)
means1 = by(df1, x, y => mean)[:, 2];
erry1 = [(maximum(confint(OneSampleTTest(x[:, y]), 0.05; tail=:both)) - mean(confint(OneSampleTTest(x[:, y]), 0.05; tail=:both))) for x in groupby(df1, x)]
s1 = scatter(;x=collect(0.825:1:(xcounts-1+0.825)), y=means1, mode="markers", error_y=attr(;type="data", array=erry1, visible=true, thickness=2.75, color="black"), name=firstgroupname*" mean ± 95% CI", marker_size=10, marker_symbol="square", marker_color="black", showlegend=false)
v2 = violin(;x=df2[:, x], y=df2[:, y], color=:orange, name=secondgroupname, points=:all, meanline_visible=true, pointpos=0)
means2 = by(df2, x, y => mean)[:, 2];
erry2 = [(maximum(confint(OneSampleTTest(x[:, y]), 0.05; tail=:both)) - mean(confint(OneSampleTTest(x[:, y]), 0.05; tail=:both))) for x in groupby(df2, x)]
s2 = scatter(;x=collect(1.175:1:(xcounts+0.175)), y=means2, mode="markers", error_y=attr(;type="data", array=erry2, visible=true, thickness=2.75, color="black"), name=secondgroupname*" mean ± 95% CI", marker_size=10, marker_symbol="square", marker_color="black", showlegend=false)
data=[v1,v2,s1,s2]
layout=Layout(yaxis_title=ytitle, xaxis=attr(title=xtitle,tickmode="array", tickvals=1:1:xcounts, ticktext=unique(df[:, x])), violinmode="group", width=width, height=height)
p = plot(data, layout)
savefig(p, figurename*".png", scale=3)
savehtml(p, figurename*".html", :remote)
p
end
#' # execute function
violintwogroups(df=df, x=:label1, y=:a, legend=:label2, xtitle="the x-axis", ytitle="the y-axis")
Which should produce the following:
But if you un-comment the other options for “label1”, such as the array of [0, 1, 2, 3], the violins and scatterplot will be offset and not aligned. (I would post the second image, but I am limited to one image as a new user.)
I tried searching the PlotlyJS reference for violin, but could not figure out how to control the x-positions of the violins. Does anyone know how I might make this function more robust so that the x-input could be anything (including strings, ideally) and the violins and scatter will always be aligned?
Thanks for any thoughts.