How to produce nice boxplots

Does anyone know how to produce nice boxplots out of a dataframe of this form?

│ Row │ Smarcc1 │ Socs3   │ Sox1 │ Sox2    │ Stat3   │ T   │ Tbx3    │ Tcf7l1  │
├─────┼─────────┼─────────┼──────┼─────────┼─────────┼─────┼─────────┼─────────┤
│ 1   │ 14.0695 │ 9.93292 │ 0.0  │ 10.5082 │ 12.7531 │ 0.0 │ 12.2547 │ 12.2895 │
│ 2   │ 13.4571 │ 12.5378 │ 0.0  │ 9.98157 │ 14.3046 │ 0.0 │ 12.5239 │ 13.5173 │
│ 3   │ 13.6976 │ 10.9552 │ 0.0  │ 10.555  │ 13.2031 │ 0.0 │ 12.1651 │ 13.641  │
│ 4   │ 13.592  │ 11.3802 │ 0.0  │ 11.3047 │ 13.6553 │ 0.0 │ 13.3942 │ 13.9413 │
│ 5   │ 14.4982 │ 11.5432 │ 0.0  │ 11.5915 │ 13.8775 │ 0.0 │ 12.6017 │ 14.9448 │
│ 6   │ 13.7879 │ 12.3087 │ 0.0  │ 11.5723 │ 13.2337 │ 0.0 │ 13.379  │ 15.1738 │
│ 7   │ 12.8521 │ 11.2392 │ 0.0  │ 3.46462 │ 13.45   │ 0.0 │ 12.9797 │ 12.068  │
│ 8   │ 13.8368 │ 0.0     │ 0.0  │ 8.79081 │ 12.2692 │ 0.0 │ 12.8426 │ 15.0319 │

I’d like to have a boxplot per column.
I tried tips on
_https://github.com/JuliaPlots/Plots.jl/issues/210_ and
_http://johnmyleswhite.github.io/Vega.jl/boxplot.html_.

But nothing seemed to be running or produce a nice plot.

I am using Julia version 0.6.2.

Have you tried PlotlyJS.jl? http://spencerlyon.com/PlotlyJS.jl/examples/box_plots/

I think the recipes are in StatPlots.jl.

https://github.com/JuliaPlots/StatPlots.jl

1 Like

You can use ECharts.jl, though it only supports Jupyter Notebook and Juno at the moment.

http://randyzwitch.com/ECharts.jl/box#multiple-series

Chris Rackauckas is correct, this is in StatPlots.jl.

@df yourdataframe boxplot(:Smarcc1)

@df yourdataframe boxplot!(:Socs3)

will put two boxplots side by side. Notice the ! in the second line

And Gadfly (like ggplot2). Start with the following links:
http://gadflyjl.org/stable/man/plotting.html#Plotting-wide-formatted-data-1
http://gadflyjl.org/stable/lib/scales/scale_x_discrete.html
http://gadflyjl.org/stable/lib/geometries.html
http://gadflyjl.org/stable/lib/geoms/geom_boxplot.html

One can even do:

@df yourdataframe boxplot([:Smarcc1 :Socs3])

Or, to do all the columns:

cn = names(yourdataframe)
@df yourdataframe boxplot(cols(cn))
2 Likes

Most of such plotting code will generally assume you to have tidy data, i.e.

Variable | Value
---------|---------
Smarcc1  | 14.0695
Smarcc1  | 12.4571
...
Tcf7l1   | 15.0319

and then

using StatPlots
@df yourdataframe boxplot(:Variable, :Value)

Use DataFrames.stack to reshape the DataFrame.

1 Like

Since this comes up quite high on google: StatPlots.jl has been renamed to StatsPlots.jl.

I am trying to do the same to plot all the columns in the dataframe (If I am selecting one by one it works), but this code doesn´t work for me. The error message I got is this:

ERROR: UndefVarError: cols not defined
Stacktrace:
 [1] top-level scope

Is now a new way to select all the columns?

At the end I could solve it using this

N = ncol(df)
@df float_df boxplot(cols(1:N))

To plot all the columns, simply:

@df float_df boxplot(cols())
2 Likes

good to know, Thank u!