How to set histogram alignment in Plots.jl


#1

In matplotlib.pyplot.hist their is an align keyword that changes alignment of the histogram from “mid”, “left”, or “right”. This keyword in currently not supported on Plots.jl. It there a way I could force pass it to the pyplot graphical backend?


#2

No, there is no way to pass backend-specific code through Plots. Plots calls are meant to be backend-agnostic.
To me personally that option sounds strange, like it would misrepresent the data.
If you want the behaviour you could just overload the ticks, e.g. xticks = (1:5+0.5, 1:5)


#3

Something seems quite wrong with way histogram on plots is working. I can not make it match what one gets from PyPlot. I just submitted an issue: https://github.com/JuliaPlots/Plots.jl/issues/930


#4

I find it interesting that you assume something is wrong in Plots because it has a different output than PyPlot does? Is PyPlot the reference for ultimate truth?

On a more serious note, there are many different algorithms for calculating bin numbers and edge positions. Most algorithms try to hit some edges that are easy to interpret, and a bin number that represents the distribution. Plots by default uses Sturges’ method, this was chosen for consistency with StatsBase.Histogram, but it is also the most widespread (it is the default in R). But there are a lot of other possibilities.

The binning in Plots is determined by the bins parameter. To see the default values and types, aliases and various options for plot attributes, you can use the plotattr function on an attribute (or any of its aliases):

julia> plotattr("bins")
bins {Integer, NTuple{2,Integer}, AbstractVector or Symbol}
bin, binss, nb, nbin, nbins

Default is :auto. For histogram-types, defines the approximate number of bins to aim for, or the auto-binning algorithm to use (:sturges, :sqrt, :rice, :scott or :fd). For fine-grained control pass a Vector of break values, e.g. `linspace(extrema(x)..., 25)`
Series attribute,  default: auto

If you don’t like the result, try a different algorithm. If you want complete control over bins, pass an AbstractVector with edge positions. If you want exactly 30 bins, use linspace(extrema(x)..., 30).


#5

Great answer @mkborregaard, thanks!


#6

Is this kind of thing in the docs / Examples somewhere?


#7

Thanks! The output of plotattr can also be found at https://juliaplots.github.io/attributes/ but I just checked and the current build of the webpage is from before @oschulz added the new and expanded histogram functionality to Plots.

One of the things that’s high on the prority list is to have a section in the docs for each seriestype, so there will be a page on histogram capabilities in the (hopefully near) future.


#8

Thank you @mkborregaard. Far from thinking that PyPlot is the ultimate truth, I had assumed that Plots would be making use of the PyPlot’s histogram plotting function. Hence the confusion. Now I understand that it is doing its own thing, just working the bare primitives of the “backend”.


#9

@mkborregaard Somethings still seems completely non-intuitive about setting bins, this is why I created an issue about it.
For example if you run:

using Distributions, Plots
pyplot()
srand(15); x=rand(Exponential(4),200);
for i=10:5:50
    histogram(x,bins=i)
    savefig("TestPlotFigPlts_$i.png")
end

The 10 and 15 bins plot will be identical. They will show 11 bars. Also the 20, 25, 30, and 35 bins will also be identical, so will the 40, 45, and 50 bins.

I would expect the number of bars to match the number of bins. This does not seem to be happening. I know that be exactly controlled via linspace(extrema(x)..., real_bins)

The closest I could get to perform as desired is to run

using Distributions, Plots
pyplot()
srand(15); x=rand(Exponential(4),200);
for i=10:5:50
    histogram(x,bins=linspace(extrema(x)..., i+1))
    savefig("TestPlotFigPlts_$i.png")
end

#10

I guess what is intuitive is different for different people. The advantage of this is that it’s versatile. If you want some specific bins or want several histograms to have the same bins, just define a vector with the edge positions and pass it to bins. If you do want to advocate to have it changed you should re-open the issue, but I strongly doubt most of JuliaPlots will agree with you - since we just had a 3-month discussion developing the current system.


#11

Let me just state what I mean - if you always expect the bin numbers to be exact, what should then constitute the minimum and maximum values? Then that is determined by the most extreme points, which means that the bin locations and sizes will never be consistent but go all over the place only determined by the (essentially random) extrema. To have any kind of consistency, you’ll need to specify the highest and lowest bin edge. That is, in fact exactly what you do when specifying bins = linspace(-5,5,20) (or -5:5 if you’re happy with integer bins), so that cannot be any more succinct than what it is now.
Then you may wish to align the extrema to some sensible values (like integers), but then you should also align the edges to sensible values. That is in fact what the system does now - you get the exact number of bins, some of them towards the ends of the distribution are just empty and thus doesn’t show up! You cannot have all of these at once - exact bin numbers, guarantee for the edge-most bins to contain points, and stability to the outliers. That’s mathematically impossible.
I should add that the functionality you showed for pyplot in the issue is highly unusual - some variant of what Plots does now is standard across statistical software.


#12

Thank you for clarifying the workings of Plots.jl's histogram function. This makes a great deal of sense now. One is often biased by the tools that one uses. In my case, I haven’t explored much beyond matplotib, I was blind to richness of choices in the ways one can construct histograms. I now have a better appreciation of the limitations of matplotlib's hist function.