Usage of different types of weights

Hi @jeffwong, thank you for the links, they are very useful in order to put this topic into context. From what I’ve understood, the distinction that is being made between weights has the purpose of 1) encompassing duplicates in the data or 2) performing regression on aggregated points.

My understanding is that 1) is a design choice where one has to decide between asking the user to remove duplicates in the data before applying regression or implementing adjustments to the regression coefficients to accommodate the repetition explicitly. If this is an equivalence and there is no situation where cleaning the data beforehand solves the issue, then I don’t see much value in modifying the implementation with weights, to me it feels like unnecessary complexity. Please let me know if they are not equivalent, I’d be interested to learn.

For 2), in the lecture notes you linked, they have an example with a individual/village regression where they state that the “random unit” is the village, which is an aggregate of individuals. In geostatistics, this is a well known problem in which one has to perform estimates or inference on blocks that are on different support than that of the samples. We have developed plenty of methods for this problem that take this weighting into account, but at no point in history we had to define a different weight types explicitly. Sadly, GeoStats.jl doesn’t have these methods implemented yet, so that I could demonstrate what I mean, but they will be there at some point.

With that said, it is good to see weight types in Julia anyways for multiple dispatch and for triggering the appropriate variant of the estimator. Specially for 2) when the data comes already aggregated and there is no way to undo the aggregation.