I have a big map-reduce operation on a large radio telescope FFT dataset, which gathers counts to multiple histograms for maybe 11-12 metrics. Now that this is working, I want to see about capturing the intersection of these patterns better.
Just thinking out loud - it seems that my options are either to capture if flat like
[bin1ID, bin2ID, bin3ID, ...] and aggregate later, or increment the weights directly into some kind of 12-dimensional tensor. I have clear bins defined by the underlying theories (Poisson point process, etc), so there’s no need to dynamically bin. The map-reduce is parallel, so a remote worker’s data must be merged with a master copy; this works great so far with
Any bright ideas for an efficient data structure for this? Thanks for your thoughts.