Julia equivalent of scipy binned_statistic_dd

jpmorr · July 8, 2021, 8:45am

I’m a python user and I’ve just started experimenting with julia to see if it is as quick as I keep reading.

I’m usually dealing with 3D data and one of the most frequently used functions in my workflow is SciPy’s (binned_statistic). I’m trying to find julia’s equivalent but all i’ve found is StatsBase.Histogram , which seems to be just standard binning without the 3D abilities or the statistics.

Is there a direct replacement? I don’t really want to have to write a direct replacement (at least not now anyway) just to get started comapring to python.

nilshg · July 8, 2021, 9:14am

Why do you say StatsBase.Histogram doesn’t do 3D? There’s a multivariate example in the docs you linked, which generalized to any number of dimensions:

julia> fit(Histogram, (rand(100), rand(100), rand(100), rand(100), rand(100)))
Histogram{Int64, 5, NTuple{5, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0

Although I don’t see an easy way to leverage this to apply different functions to the data points in each bin (which iiuc ir your ultimate goal?), as I don’t think you can access the points → bin mapping from the histogram object.

jpmorr · July 8, 2021, 10:25am

@nilshg OK, that’s definitely a case of me not reading the docs carefully enough as I missed the 2D example. But looking at it now, it still seems to be different as it’s binning multiple vecotrs against a single set of bins, whereas what I want to do is bin based on a single set of data against 3D bins, like:

h = fit(Histogram, (rand(1000,3)), nbins=(40, 5, 9))

In this case I have 1000 coordiantes in x,y, znd and it want to bin that 3D data into a 3D grid with specified numbers of bins in each direction. I don’t think Histogram will do that, at least from what I can see in the documentation anyway.

I think if you get returned counts and indices like in scipy or numpy it would be easy enough to work out the statistics after the fact, but I’m not sure the counts and IDs for each bin are retunred by Histogram.

sethaxen · July 9, 2021, 9:50am

This should do what you want:

x = rand(1000,3)
h = fit(Histogram, Tuple(eachcol(x)); nbins=(40, 5, 9))
bin_ids = map(xi -> StatsBase.binindex(h, Tuple(xi)), eachrow(x))

jpmorr · July 9, 2021, 11:06am

@sethaxen Thanks, this is great. It’s right along the path to what I want to do. I can add the stats part later with a new fucntion. It would be good to see an example like this in the documentation to help people get started with some more advanced things.

Topic		Replies	Views
BinStatistics.jl: Highly flexible and efficient computation of n-dimensional binned statistic(s) for n-variable(s) Package Announcements	0	253	February 27, 2023
Plot 2D histogram of 3D+ multivariate data Visualization plotting	1	492	July 29, 2021
Getting bins from Plots.jl histogram General Usage plotting , statistics , plots	13	9031	January 7, 2022
Hist() and hits() in Julia? Statistics question	2	3998	December 30, 2016
Weighted histogram ~2x as slow in Julia vs. Python Performance question , statistics , python	13	1200	January 20, 2022

Julia equivalent of scipy binned_statistic_dd

Related topics