I have a bunch of coordinates and corresponding values:
n = 100
xy = [rand(2) for _ in 1:n]
v = rand(n)
I want to 2D-bin the coordinates, and calculate the mean of all the values, v
, that fall into each corresponding bin.
For example, if only these two coordinate-value pairs fell into the same bin (e.g with edges ([0,0.2), [0,0.2))
) :
xy1 = (0.1, 0.1)
v1 = 0.0
xy2 = (0.01, 0.19)
v2 = 1.0
then I expect that the bin containing them would have the mean of these coordinates’ values: (0 + 1)/2 = 0.5
.
Do you know of any “pakaged” operation that can accomplish this?
The only way I can think of is:
In the standard form of a histogram the bin would contain their count (ignoring the values), and in the weighted form of a histogram (see docs here), the bin would contain the sum of the values. So I could calculate the weighted and unweighted histograms, divide the weighted by the unweighted and get what I want:
using StatsBase
x = first.(xy)
y = last.(xy)
edges = (0:0.2:1, 0:0.2:1)
uw = fit(Histogram, (x, y), edges)
w = fit(Histogram, (x, y), weights(v), edges)
h = w.weights./uw.weights