Histogram bars become line when many observations?

I noticed that when there are a lot of (e.g. 10 million) observations, histogram produces a line plot instead of a bar plot:

using Distributions, StatsPlots
x = rand(Normal(), 1_000_000)
histogram(x, bins = -5:0.2:5, label = "x")
savefig("x.png")

x

y = rand(Normal(), 10_000_000)
histogram(y, bins = -5:0.2:5, label = "y")
savefig("y.png")

y

The same thing happens when I pass in normalize = true. Is this behavior intentional? Is there a way to turn it off, i.e. force there to be bars? I’m using Distributions v0.21.11 and StatsPlots v0.13.0.

It does not look like it


Maybe you can manually specify seriestype to be barhist instead of calling histogram?

2 Likes

Yes, calling plot(x, seriestype = :barhist) did it for me - thanks!

2 Likes

Exactly. It’s a bit of an overmagic compromise stemming from that me and @oschulz couldn’t agree on which of those two should be the default, coming from scientific fields with very different usual n.

3 Likes

Yes - I had originally proposed to use stephist as the default, because it works for few and very many bins, but bar-style histograms seemed to be too popular to not make them the default. But maybe this magic should depend on the number of bins, not the number of observations? @mkborregaard, what do you think?

The ideal solution, long term, would be to have something like a filled step hist with lines between the bins as the default, and make the lines between the bins thinner/weaker as the bin count increases, until they finally vanish if there are very many bins. This way, we’d have a smooth transition between both styles as the default.

3 Likes

I agree it’s better to bar it on the number of bins, but in the current implementation the series type is chosen before autobins are calculated

in the current implementation the series type is chosen before autobins are calculated

Ah, right, that was the reason!