Getting bins from Plots.jl histogram

OvidiusCicero · February 17, 2021, 11:03am

I can make a histogram plot by

histogram(rand(1000))

Can I get the values of the resulting binning?

rafael.guerra · February 17, 2021, 12:25pm

Check StatsBase.jl:

using StatsBase, Plots
h = fit(Histogram, rand(100), nbins=10)
plot(h)

Then access bin values with h.weights:

julia> h
Histogram{Int64, 1, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.1:1.0
weights: [7, 9, 9, 15, 12, 6, 10, 18, 6, 8]
closed: left
isdensity: false

nilshg · February 17, 2021, 12:45pm

I recall from a So question I answered some time ago that the book construction is different in plots and StatsBase (histogram2d - Return the frequency in a bin of a 2D histogram in Julia - Stack Overflow) - not sure that was ever aligned

rafael.guerra · February 17, 2021, 1:06pm

@nilshg, thanks for the link. For 2D histograms, the StatsBase functionality seems to be more limited, with no user control on the 2d bins (at least it is not documented).

# computing 2D histograms
data = (randn(10_000), randn(10_000))
h = fit(Histogram, data, nbins=40)  # how to finely control on size of 2d-bins ??
y = diff(h.edges[1])/2 .+ h.edges[1][1:end-1]
x = diff(h.edges[2])/2 .+ h.edges[2][1:end-1]
heatmap(x, y, h.weights)

OvidiusCicero · February 17, 2021, 1:17pm

yes, they don’t do the same all the time

rafael.guerra · February 17, 2021, 1:23pm

@OvidiusCicero, please provide an example for 1D where problem arises, it will be useful.

OvidiusCicero · February 17, 2021, 1:50pm

It’s difficult to provide the simulation data but I need to manually set nbins to get the same number of bins (this makes a pipeline more difficult)

at the moment I plot the result of fit with bar

rafael.guerra · February 17, 2021, 1:59pm

I could not find obvious problems but did not try hard enough.
See example of perfect overlap below:

using StatsBase, Plots
data1d = rand(100)
histogram(data1d, bins=10, label="histogram plot", legend=:topleft, ylims=(0,20))
h = fit(Histogram, data1d, nbins=10)
plot!(h, seriestype=:steps, lw=3, lc=:blue, label="StatsBase histogram")
savefig("histogram2d_vs_StatsBase_histogram.png")

histogram_plot_vs_StatsBase_histogram

joa-quim · February 17, 2021, 2:47pm

You can get 2D histograms with GMT

using GMT

# Compute a grid with counting's
G = blockmean(rand(100,2) * 100, region=(0,100,0,100), inc=10, npts=:n, grid=true);
# Plot it
bar3(G, fill=[0,115,190], lw=0.25, fmt=:png, show=true)
# Convert to x,y,z. Empty cells would have NaNs, the *skip_NaN* option takes care of it
D = grd2xyz(G, skip_NaN=true));
# Data is in the *data* field
D[1].data
65×3 Matrix{Float64}:
  20.0  100.0  1.0
  30.0  100.0  1.0
  50.0  100.0  1.0
  70.0  100.0  1.0
  90.0  100.0  1.0
   0.0   90.0  2.0
...

joa-quim · February 17, 2021, 2:59pm

Sorry, something is not right. It’s not giving the countings.

EDIT; Now it is but I have to see why the npts=true alone had not worked as it should.

OvidiusCicero · February 17, 2021, 9:27pm

I can’t make a minimal working example out of it but the default binning of Ploots.jl is different from StatsBase so you need to explicitly select nbins and bins respectively to get the same results.

Otherwise, the selected solution works

mgiugliano · November 29, 2021, 9:18am

Thank you. Is it however usual in Julia that the returned data structures are so “bloated”?

In order to get to the numerical values of the bins, I had to resort to type

collect(h.edges[1])

after discovering that h.edges is a 1-element Tuple, containing a range. And yet, it is not even the same length of h.edges

rafael.guerra · November 29, 2021, 9:40am

@mgiugliano, the developpers of StatsBase will have the correct answer to your question.
As an user, I can only remark that from the source code annotation: edges are an iterator that contains the boundaries of the bins in each dimension, which for the 1D case corresponds to objects like (example): (0.0:0.1:1.0,). But do not know why it has been defined this way.

dfc123 · January 7, 2022, 4:57pm

I don’t why it was defined that way either, but in contrast to “bins,” which appears to be treated as a suggestion, inputting edges (as a “StepRangeLen” tuple in the StatsBase Histogram “fit”) fixes the bin size, which is a nice feature, and it can also be used in two dimensions, which you mentioned n the post, above, from 3 Feb 2021, e.g.,

firstEdge = 0.0
lastEdge = 10.0
binSize = 1.0
EdgeRange = (firstEdge:binSize:lastEdge)
h = fit(Histogram, (x, y), weights(W), (EdgeRange, EdgeRange))

Did you find that if you first plot just h directly, “plot (h)”, and compare to using h.weights an input to heatmap (again the 3 Feb post), x and y are flipped? Seems crazy, but I had to “transpose(h.weights)” to get the same orientation in the heatmap. Am I missing something? I’m using Julia 1.7.0

Topic		Replies	Views
Plots histogram2d not displaying correct number of bins General Usage plotting , vscode , plotlyjs	4	644	March 31, 2021
Getting bin values from histogram Visualization question , plotting , plots , histogram	1	296	May 30, 2024
Plotting 2D histogram with different bins for each dimension General Usage plotting , statsplots	4	3605	November 2, 2024
Histogram Bin Width Control With Plots.jl New to Julia plots	5	5470	September 16, 2021
Hist() and hits() in Julia? Statistics question	2	3998	December 30, 2016

Getting bins from Plots.jl histogram

Related topics