Fit histograms for multiple series in the same bar-plot

I just ran into the age old problem of wanting to plot bar histograms (using StatsPlots.jl) from multiple series of data as a grouped bar plot (i.e bars from different series appearing next to each other in each bin).

Afaik there is a constraint that for a grouped bar plot to happen, all series must have the exact same x-axis (the bins), or have I gotten that wrong?

If not, is there any built in convenince to do this nowadays?

What I’ve done in the past is to just go through all series and find the global max and min and then fit a histogram for each series using a fixed number of bins.

I guess one drawback is that you just have to throw out all the auto-binning stuff that StatsPlots/StatsBase does and implement your own or just try different number of bins for each plot until the histograms look good, which is not ideal given that I want to do this in an automatic report generation tool.

It’s already in-built: [WIP] Grouped histogram (groupedhist) by pearlzli · Pull Request #319 · JuliaPlots/StatsPlots.jl · GitHub
But yes, they need the same binning.

1 Like

That looks wrong.

1 Like

My apologies. After having read more carefully the OP’s question, the proposed workaround does not fit the requirements.

2 Likes

Thanks alot!

Looks like the DataFrame approach does the magic I was hoping to find as the cost of having to put stuff in DataFrames. I will explore it some more and see if I can get to to work the way I want to.

You shouldn’t have to put it into a DataFrame, no StatsPlots recipes require that.

Hmmm, this is the one which seems like it would fit my use case:

@df iris groupedhist(:SepalLength; group = (:Species, :Color))

If I instead have:

a = randn(123)
b = randn(17)
etc...

Is there a similar command for that case? I’m sorry if it follows naturally from one of the examples in StatsPlots. The ones I could find seem to want the input as a matrix, but maybe a vector of vectors also works?

Working example with a DataFrame:

julia> a = randn(123);

julia> b = rand(4:10, 17); 

julia> df = DataFrame(series=vcat(fill(:a, length(a)), fill(:b, length(b))), data=vcat(a,b));

julia> @df df groupedhist(:data, group=:series)

I already have DataFrames as a dependency, so its not the end of the world to have make a new one.

you can just do groupedhist(data, group = series) if you have data and group as a matrix and vector

Sorry for being stupid here, but how do I form the data matrix when vectors have different lengths? Do I need to pad rows with missing/NaN? Or can I just concat the vectors and make series index aligned, i.e. the exact same format as in the DataFrame in the example?

Is it possble to create an MWE from the arrays in my example above?

Thanks for all your help!

Yes, the exact same format as in the DataFrame in the example.

1 Like

Thanks! It worked.

In hindsight I guess it is kinda obvious since this is basically what the @df macro expands to, right?

exactement :slight_smile:

1 Like