How to set histogram alignment in Plots.jl

mbeltagy · June 14, 2017, 2:44pm

In matplotlib.pyplot.hist their is an align keyword that changes alignment of the histogram from “mid”, “left”, or “right”. This keyword in currently not supported on Plots.jl. It there a way I could force pass it to the pyplot graphical backend?

mkborregaard · June 14, 2017, 2:58pm

No, there is no way to pass backend-specific code through Plots. Plots calls are meant to be backend-agnostic.
To me personally that option sounds strange, like it would misrepresent the data.
If you want the behaviour you could just overload the ticks, e.g. xticks = (1:5+0.5, 1:5)

mbeltagy · June 14, 2017, 4:01pm

Something seems quite wrong with way histogram on plots is working. I can not make it match what one gets from PyPlot. I just submitted an issue: https://github.com/JuliaPlots/Plots.jl/issues/930

mkborregaard · June 14, 2017, 7:09pm

I find it interesting that you assume something is wrong in Plots because it has a different output than PyPlot does? Is PyPlot the reference for ultimate truth?

On a more serious note, there are many different algorithms for calculating bin numbers and edge positions. Most algorithms try to hit some edges that are easy to interpret, and a bin number that represents the distribution. Plots by default uses Sturges’ method, this was chosen for consistency with StatsBase.Histogram, but it is also the most widespread (it is the default in R). But there are a lot of other possibilities.

The binning in Plots is determined by the bins parameter. To see the default values and types, aliases and various options for plot attributes, you can use the plotattr function on an attribute (or any of its aliases):

julia> plotattr("bins")
bins {Integer, NTuple{2,Integer}, AbstractVector or Symbol}
bin, binss, nb, nbin, nbins

Default is :auto. For histogram-types, defines the approximate number of bins to aim for, or the auto-binning algorithm to use (:sturges, :sqrt, :rice, :scott or :fd). For fine-grained control pass a Vector of break values, e.g. `linspace(extrema(x)..., 25)`
Series attribute,  default: auto

If you don’t like the result, try a different algorithm. If you want complete control over bins, pass an AbstractVector with edge positions. If you want exactly 30 bins, use linspace(extrema(x)..., 30).

dpsanders · June 14, 2017, 9:43pm

Great answer @mkborregaard, thanks!

dpsanders · June 14, 2017, 9:45pm

Is this kind of thing in the docs / Examples somewhere?

mkborregaard · June 15, 2017, 5:48am

Thanks! The output of plotattr can also be found at https://juliaplots.github.io/attributes/ but I just checked and the current build of the webpage is from before @oschulz added the new and expanded histogram functionality to Plots.

One of the things that’s high on the prority list is to have a section in the docs for each seriestype, so there will be a page on histogram capabilities in the (hopefully near) future.

mbeltagy · June 15, 2017, 7:12am

Thank you @mkborregaard. Far from thinking that PyPlot is the ultimate truth, I had assumed that Plots would be making use of the PyPlot’s histogram plotting function. Hence the confusion. Now I understand that it is doing its own thing, just working the bare primitives of the “backend”.

mbeltagy · June 19, 2017, 12:05pm

@mkborregaard Somethings still seems completely non-intuitive about setting bins, this is why I created an issue about it.
For example if you run:

using Distributions, Plots
pyplot()
srand(15); x=rand(Exponential(4),200);
for i=10:5:50
    histogram(x,bins=i)
    savefig("TestPlotFigPlts_$i.png")
end

The 10 and 15 bins plot will be identical. They will show 11 bars. Also the 20, 25, 30, and 35 bins will also be identical, so will the 40, 45, and 50 bins.

I would expect the number of bars to match the number of bins. This does not seem to be happening. I know that be exactly controlled via linspace(extrema(x)..., real_bins)

The closest I could get to perform as desired is to run

using Distributions, Plots
pyplot()
srand(15); x=rand(Exponential(4),200);
for i=10:5:50
    histogram(x,bins=linspace(extrema(x)..., i+1))
    savefig("TestPlotFigPlts_$i.png")
end

mkborregaard · June 19, 2017, 12:16pm

I guess what is intuitive is different for different people. The advantage of this is that it’s versatile. If you want some specific bins or want several histograms to have the same bins, just define a vector with the edge positions and pass it to bins. If you do want to advocate to have it changed you should re-open the issue, but I strongly doubt most of JuliaPlots will agree with you - since we just had a 3-month discussion developing the current system.

mkborregaard · June 19, 2017, 12:38pm

Let me just state what I mean - if you always expect the bin numbers to be exact, what should then constitute the minimum and maximum values? Then that is determined by the most extreme points, which means that the bin locations and sizes will never be consistent but go all over the place only determined by the (essentially random) extrema. To have any kind of consistency, you’ll need to specify the highest and lowest bin edge. That is, in fact exactly what you do when specifying bins = linspace(-5,5,20) (or -5:5 if you’re happy with integer bins), so that cannot be any more succinct than what it is now.
Then you may wish to align the extrema to some sensible values (like integers), but then you should also align the edges to sensible values. That is in fact what the system does now - you get the exact number of bins, some of them towards the ends of the distribution are just empty and thus doesn’t show up! You cannot have all of these at once - exact bin numbers, guarantee for the edge-most bins to contain points, and stability to the outliers. That’s mathematically impossible.
I should add that the functionality you showed for pyplot in the issue is highly unusual - some variant of what Plots does now is standard across statistical software.

mbeltagy · June 19, 2017, 12:59pm

Thank you for clarifying the workings of Plots.jl’s histogram function. This makes a great deal of sense now. One is often biased by the tools that one uses. In my case, I haven’t explored much beyond matplotib, I was blind to richness of choices in the ways one can construct histograms. I now have a better appreciation of the limitations of matplotlib’s hist function.

Roger-luo · May 3, 2019, 8:56am

Just in case someone else find this doesn’t work in 1.0, since

julia> 1:5 + 0.5
1.0:1.0:5.0

one should write

xticks = (1:5) .+ 0.5, 1:5

in 1.0

abelborges · August 1, 2020, 2:47pm

And if someone else is still looking into this topic, I’m talking from Julia 1.4.2 and Plots.jl 1.5.6 and I’ve just discovered that Freedman-Diaconis is the default now for the number of bins (idk since when). Personally, I find it much better than Sturges rule. That’s the post

Topic		Replies	Views
plotting a histogram in julia with as many bins as values General Usage question	1	141	August 16, 2023
Histogram Bin Width Control With Plots.jl New to Julia plots	5	5503	September 16, 2021
How do I make an histogram with no vertical edges between bars in Plots.jl? Visualization plots , style , histogram	2	875	May 7, 2022
Getting bins from Plots.jl histogram General Usage plotting , statistics , plots	13	9036	January 7, 2022
Bincount of histogram General Usage question , package , plotting	4	3518	March 15, 2020

How to set histogram alignment in Plots.jl

Related topics