A histogram that averages metadata instead of counts

Hi @yakir12 you seem to describe something along the lines of the EmpiricalHistogram in GeoStats.jl. It takes into account the coordinates of the samples to correct for clustering effects. It is defined for N-dimensional spatial domains, not only 2D. Suppose you have some data in a vector z and coordinates in a 2D matrix (rows are dimensions (x, y, z, โ€ฆ) and columns are samples). The first thing you need to do is georeference the data:

using GeoStats

# data and coordinates
z = rand(100)
X = rand(2, 100)

# spatial data
๐’Ÿ = georef((z=z,), X)
100 PointSet{Float64,2}
  variables
    โ””โ”€z (Float64)

Now GeoStats.jl is aware of where the samples are in a given spatial domain and can do very fancy stuff with this dataset that wouldnโ€™t be possible otherwise without the coordinates. You can compute the histogram of the z variable with:

h = EmpiricalHistogram(๐’Ÿ, :z)

And optionally pass the size of โ€œbinsโ€:

h = EmpiricalHistogram(๐’Ÿ, :z, 0.1)

There is a lot going on in this procedure. If you prefer to do things by hand, you can leverage the partitioning algorithms in the framework. For example, you can partition your spatial dataset with blocks (or bins) of given size:

โ„ฌ = partition(๐’Ÿ, BlockPartitioner(0.1,0.2))
50 SpatialPartition
  Nยฐ points
  โ””โ”€2
  โ””โ”€2
  โ””โ”€1
  โ””โ”€2
  โ””โ”€2
  โ‹ฎ
  โ””โ”€1
  โ””โ”€1
  โ””โ”€1
  โ””โ”€1
  โ””โ”€1
  metadata: neighbors

And each block is a spatial dataset:

โ„ฌ[1]
2 DomainView{Float64,2}
  variables
    โ””โ”€z (Float64)

That means you can apply normal functions on a per block basis:

ฮผโ‚ = mean(โ„ฌ[1][:z])
0.49119514111824636

ฮผโ‚‚ = mean(โ„ฌ[2][:z])
0.39853597367487636

For more information about this procedures, you can watch the tutorial on spatial declustering:

1 Like