Julia equivalent of scipy binned_statistic_dd

I’m a python user and I’ve just started experimenting with julia to see if it is as quick as I keep reading.

I’m usually dealing with 3D data and one of the most frequently used functions in my workflow is SciPy’s (binned_statistic). I’m trying to find julia’s equivalent but all i’ve found is StatsBase.Histogram , which seems to be just standard binning without the 3D abilities or the statistics.

Is there a direct replacement? I don’t really want to have to write a direct replacement (at least not now anyway) just to get started comapring to python.

Why do you say StatsBase.Histogram doesn’t do 3D? There’s a multivariate example in the docs you linked, which generalized to any number of dimensions:

julia> fit(Histogram, (rand(100), rand(100), rand(100), rand(100), rand(100)))
Histogram{Int64, 5, NTuple{5, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}
edges:
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0
  0.0:0.2:1.0

Although I don’t see an easy way to leverage this to apply different functions to the data points in each bin (which iiuc ir your ultimate goal?), as I don’t think you can access the points → bin mapping from the histogram object.

@nilshg OK, that’s definitely a case of me not reading the docs carefully enough as I missed the 2D example. But looking at it now, it still seems to be different as it’s binning multiple vecotrs against a single set of bins, whereas what I want to do is bin based on a single set of data against 3D bins, like:

h = fit(Histogram, (rand(1000,3)), nbins=(40, 5, 9))

In this case I have 1000 coordiantes in x,y, znd and it want to bin that 3D data into a 3D grid with specified numbers of bins in each direction. I don’t think Histogram will do that, at least from what I can see in the documentation anyway.

I think if you get returned counts and indices like in scipy or numpy it would be easy enough to work out the statistics after the fact, but I’m not sure the counts and IDs for each bin are retunred by Histogram.

This should do what you want:

x = rand(1000,3)
h = fit(Histogram, Tuple(eachcol(x)); nbins=(40, 5, 9))
bin_ids = map(xi -> StatsBase.binindex(h, Tuple(xi)), eachrow(x))
1 Like

@sethaxen Thanks, this is great. It’s right along the path to what I want to do. I can add the stats part later with a new fucntion. It would be good to see an example like this in the documentation to help people get started with some more advanced things.