Getting bins from Plots.jl histogram

I can make a histogram plot by

histogram(rand(1000))

Can I get the values of the resulting binning?

Check StatsBase.jl:

using StatsBase, Plots
h = fit(Histogram, rand(100), nbins=10)
plot(h)

Then access bin values with h.weights:

julia> h
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.1:1.0
weights: [7, 9, 9, 15, 12, 6, 10, 18, 6, 8]
closed: left
isdensity: false
5 Likes

I recall from a So question I answered some time ago that the book construction is different in plots and StatsBase (histogram2d - Return the frequency in a bin of a 2D histogram in Julia - Stack Overflow) - not sure that was ever aligned

3 Likes

@nilshg, thanks for the link. For 2D histograms, the StatsBase functionality seems to be more limited, with no user control on the 2d bins (at least it is not documented).

# computing 2D histograms
data = (randn(10_000), randn(10_000))
h = fit(Histogram, data, nbins=40)  # how to finely control on size of 2d-bins ??
y = diff(h.edges[1])/2 .+ h.edges[1][1:end-1]
x = diff(h.edges[2])/2 .+ h.edges[2][1:end-1]
heatmap(x, y, h.weights)

yes, they don’t do the same all the time :frowning:

@OvidiusCicero, please provide an example for 1D where problem arises, it will be useful.

It’s difficult to provide the simulation data but I need to manually set nbins to get the same number of bins (this makes a pipeline more difficult)

at the moment I plot the result of fit with bar

I could not find obvious problems but did not try hard enough.
See example of perfect overlap below:

using StatsBase, Plots
data1d = rand(100)
histogram(data1d, bins=10, label="histogram plot", legend=:topleft, ylims=(0,20))
h = fit(Histogram, data1d, nbins=10)
plot!(h, seriestype=:steps, lw=3, lc=:blue, label="StatsBase histogram")
savefig("histogram2d_vs_StatsBase_histogram.png")

histogram_plot_vs_StatsBase_histogram

1 Like

You can get 2D histograms with GMT

using GMT

# Compute a grid with counting's
G = blockmean(rand(100,2) * 100, region=(0,100,0,100), inc=10, npts=:n, grid=true);
# Plot it
bar3(G, fill=[0,115,190], lw=0.25, fmt=:png, show=true)
# Convert to x,y,z. Empty cells would have NaNs, the *skip_NaN* option takes care of it
D = grd2xyz(G, skip_NaN=true));
# Data is in the *data* field
D[1].data
65Ă—3 Matrix{Float64}:
  20.0  100.0  1.0
  30.0  100.0  1.0
  50.0  100.0  1.0
  70.0  100.0  1.0
  90.0  100.0  1.0
   0.0   90.0  2.0
...

2 Likes

Sorry, something is not right. It’s not giving the countings.

EDIT; Now it is but I have to see why the npts=true alone had not worked as it should.

1 Like

I can’t make a minimal working example out of it but the default binning of Ploots.jl is different from StatsBase so you need to explicitly select nbins and bins respectively to get the same results.

Otherwise, the selected solution works

1 Like

Thank you. Is it however usual in Julia that the returned data structures are so “bloated”?

In order to get to the numerical values of the bins, I had to resort to type

collect(h.edges[1])

after discovering that h.edges is a 1-element Tuple, containing a range. And yet, it is not even the same length of h.edges :frowning:

@mgiugliano, the developpers of StatsBase will have the correct answer to your question.
As an user, I can only remark that from the source code annotation: edges are an iterator that contains the boundaries of the bins in each dimension, which for the 1D case corresponds to objects like (example): (0.0:0.1:1.0,). But do not know why it has been defined this way.

1 Like

I don’t why it was defined that way either, but in contrast to “bins,” which appears to be treated as a suggestion, inputting edges (as a “StepRangeLen” tuple in the StatsBase Histogram “fit”) fixes the bin size, which is a nice feature, and it can also be used in two dimensions, which you mentioned n the post, above, from 3 Feb 2021, e.g.,

firstEdge = 0.0
lastEdge = 10.0
binSize = 1.0
EdgeRange = (firstEdge:binSize:lastEdge)
h = fit(Histogram, (x, y), weights(W), (EdgeRange, EdgeRange))

Did you find that if you first plot just h directly, “plot (h)”, and compare to using h.weights an input to heatmap (again the 3 Feb post), x and y are flipped? Seems crazy, but I had to “transpose(h.weights)” to get the same orientation in the heatmap. Am I missing something? I’m using Julia 1.7.0