I thought of an option that seems a natural extension of leftjoin
. Suppose that you have two dataframes, where one is a strict subset of the other.
df1 = DataFrame(ID=[1, 2, 3], Name=["John Doe", "Jane Doe", "Joe Blogs"], country=["US", "US", "US"])
3×3 DataFrame
Row │ ID Name country
│ Int64 String String
─────┼───────────────────────────
1 │ 1 John Doe US
2 │ 2 Jane Doe US
3 │ 3 Joe Blogs US
df2 = DataFrame(ID=[2], Name=["Jane Doe"], country=["CA"])
1×3 DataFrame
Row │ ID Name country
│ Int64 String String
─────┼──────────────────────────
1 │ 2 Jane Doe CA
Imagine you want to update the value of df1.country
based on df2
. It seems natural to have:
leftjoin!(df1, df2, on =[:ID,:Name]; replacement=true).
(or some other name, like subset=true
), whose result would be
Row │ ID Name country
│ Int64 String String
─────┼───────────────────────────
1 │ 1 John Doe US
2 │ 2 Jane Doe CA
3 │ 3 Joe Blogs US
I was thinking that this could be useful if you had df1
, but after several months there are updates to information or corrections, and they are presented as in df2
.
Best