I’d like to propose two added functions to the
merge function. I’m not sure if this needs to be in core language or somewhere like in DataFrames, but as long as
merge is in core, seems reasonable to put here (though input welcome!).
In particular, I’d like to propose the following two options be offered as optional keywords:
validate: duplicates the functionality of
validatekeyword in the
pandasmerge function. Accepts
"m:m", and raises an exception if the merge is not 1 to 1, 1 to many, many to 1, or many to many (respectively).
indicator: duplicates functionality of
pandas[merge] function. If
True, adds column to returned object which records whether resulting row has data from
bothdatasets, from the
left_only, or the
right_only(or if we’d prefer numerics for generality, 1, 2, and 3)
(Both are actually replications of behavior from Stata)
Personally, I find these exceedingly value when working with real world data, as there’s no place problems become more evident than in merges, and it gets exhausting writing code the replicates these functionalities every time I merge (especially the