How to properly plot DataFrames? (... and simplify my call to violin in StatPlots.)

I’m looking for a more concise way to call violin. I just want a violin of each category/column of data and also have all the labeling consistent with the Dataframe names.

# some data to play with
n = 10000
f(k) = randn(k)
df2 = DataFrame(hcat(f(n).+1, f(n).-1, 2*f(n), 0.5*f(n)-3, 2*f(n).^2, abs.(f(n))),
    map(num->Symbol("Set",num),4:9));
# Now plot a violin of each "Set" on the same graph
violin(hcat(convert(Vector{String}, df2.colindex.names)...),
    hcat(df2.columns...),
    labels=df2.colindex.names,
    trim=false)

I think this an error where the plot commands wants the labels to match the dimensions of the data. Since your data is concatated horizontally, its wants the labels to be a row vector. Try reshape(df2.colindex.names. :, 1), or better yet reshape(named(df), :, 1).

There is a PR out there to have labels accept regular vectors but I can’t find it at the moment.

Usually you would use the @df macro to work with DataFrames but this assumes a different (‘long’) data format. In your case I would do

violin(convert(Matrix, df2), xticks=(1:ncol(df2), names(df2)), legend=false)
1 Like

Thanks! This does produce the desired plot!

Is the “long” data format stack(df2)? If so, how would I use @df? I’ve tried various @df incantations that either spit out errors or didn’t make the plot I want.

To pass to the long format that’s needed for a violin plot you can simply use stack, e.g.

df2 |> stack |> @df violin(:variable, :value)

You can look at the result of stack(df2) to see what “long format” refers to.

1 Like

I tried that, but I get an error:

No user recipe defined for Symbol

Stacktrace:
 [1] copy!(::Array{Any,1}, ::Base.Generator{Array{Symbol,1},Plots.##223#225{Dict{Symbol,Any}}}) at ./abstractarray.jl:573
 [2] convertToAnyVector(::Array{Symbol,1}, ::Dict{Symbol,Any}) at /home/rick/.julia/v0.6/Plots/src/series.jl:58
 [3] macro expansion at /home/rick/.julia/v0.6/Plots/src/series.jl:131 [inlined]
 [4] apply_recipe(::Dict{Symbol,Any}, ::Type{Plots.SliceIt}, ::Array{Symbol,1}, ::Array{Float64,1}, ::Void) at /home/rick/.julia/v0.6/RecipesBase/src/RecipesBase.jl:291
 [5] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Array{Symbol,1},Array{Float64,1}}) at /home/rick/.julia/v0.6/Plots/src/pipeline.jl:81
 [6] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Array{Symbol,1},Array{Float64,1}}) at /home/rick/.julia/v0.6/Plots/src/plot.jl:179
 [7] #plot#212(::Array{Any,1}, ::Function, ::Array{Symbol,1}, ::Vararg{Any,N} where N) at /home/rick/.julia/v0.6/Plots/src/plot.jl:58
 [8] (::RecipesBase.#kw##plot)(::Array{Any,1}, ::RecipesBase.#plot, ::Array{Symbol,1}, ::Array{Float64,1}, ::Vararg{Array{Float64,1},N} where N) at ./<missing>:0
 [9] #violin#766(::Array{Any,1}, ::Function, ::Array{Symbol,1}, ::Vararg{Any,N} where N) at /home/rick/.julia/v0.6/RecipesBase/src/RecipesBase.jl:385
 [10] violin(::Array{Symbol,1}, ::Vararg{Any,N} where N) at /home/rick/.julia/v0.6/RecipesBase/src/RecipesBase.jl:385
 [11] #add_label#15(::Array{Any,1}, ::Function, ::Array{String,1}, ::Function, ::Array{Symbol,1}, ::Vararg{Any,N} where N) at /home/rick/.julia/v0.6/StatPlots/src/df.jl:129
 [12] (::##75#76)(::DataFrames.DataFrame) at ./<missing>:0
 [13] |>(::DataFrames.DataFrame, ::##75#76) at ./operators.jl:862
 [14] include_string(::String, ::String) at ./loading.jl:522

Somehow I think that should work though, could you open an issue over at StatPlots?

Btw, to be practical, the best option to use @df here is with cols that allows you to select a few columns (cols(1:2) selects the first two, cols() selects all of them, etc…).

So in this case:

@df df2 violin(cols())

or for example:

@df df2 violin(cols(1:3))

If you only want some columns.

The other method via stack requires you to manually convert one column from Symbol to String (which I believe is not ideal, Plots should take care of that automatically):

df2 |> stack |> @df violin(string.(:variable), :value)
1 Like

Thanks! To get the x-axis labels I want, is this the best way?

@df df2 violin(cols(),xticks=(1:ncol(df2), names(df2)))

Also, df2 |> stack |> @df violin(string.(:variable), :value) only plots in one color and the legend shows only one item.

Thanks! To get the x-axis labels I want, is this the best way?

@df df2 violin(cols(),xticks=(1:ncol(df2), names(df2)))

Yes, I think that’s it, though I’m not sure why Plots would give you a numeric x axis by default.

2 Likes