Merge where date ranges match

“I have two tables that need to be combined. One table contains basic information about deals and another table information about the associated products.Each row consists of a state a deal was in from Start Date Deal to End Date Deal. If one of the attributes changes, I have a new row with the new information and the date range for which this information held true. The same applies to the products table.”

"Start Date Deal,End Date Deal,Deal attributes\n1.1.2021,1.2.2021,status 1\n2.2.2021,3.3.2021,status 2\n4.3.2021,15.6.2021,status 3\n" 
"Product ID,Start Date Product,End Date Product,Product attributes\n234,15.1.2021,2.2.2021,Price 1\n234,3.2.2021,15.6.2021,Price 2\n456,15.1.2021,4.2.2021,Price 5\n456,5.2.2021,10.6.2021,Price 6\n"

In order to work with the data, I need a table that contains the information from both tables. The problem is that I have Date Ranges and this approach should be robust to this.

"Product ID,Start Date Phase,End Date Phase,Deal attributes,Product attributes\n234,1.1.2021,14.1.2021,status 1,\n234,15.1.2021,1.2.2021,status 1,Price 1\n234,2.2.2021,2.2.2021,status 2,Price1\n234,3.2.2021,3.3.2021,status 2,Price 2\n234,4.3.2021,15.6.2021,status 3,Price 2\n456,1.1.2021,14.1.2021,status 1,\n456,15.1.2021,1.2.2021,status 1,Price 5\n456,2.2.2021,4.2.2021,status 2,Price 5\n456,5.2.2021,3.3.2021,status 2,Price 6\n456,4.3.2021,10.6.2021,status 3,Price 6\n456,11.6.2021,15.6.2021,status 3,\n"

julia> result
11×5 DataFrame
 Row │ Product ID  Start Date Phase  End Date Phase  Deal attributes  Product attributes 
     │ Any         Any               Any             Any              Any
   1 │ 234         1.1.2021          14.1.2021       status 1         missing
   2 │ 234         15.1.2021         1.2.2021        status 1         Price 1
   3 │ 234         2.2.2021          2.2.2021        status 2         Price1
   4 │ 234         3.2.2021          3.3.2021        status 2         Price 2
   5 │ 234         4.3.2021          15.6.2021       status 3         Price 2
   6 │ 456         1.1.2021          14.1.2021       status 1         missing
   7 │ 456         15.1.2021         1.2.2021        status 1         Price 5
   8 │ 456         2.2.2021          4.2.2021        status 2         Price 5
   9 │ 456         5.2.2021          3.3.2021        status 2         Price 6
  10 │ 456         4.3.2021          10.6.2021       status 3         Price 6
  11 │ 456         11.6.2021         15.6.2021       status 3         missing

@sl-solution, I think this should be suitable for range join that you mentioned in ANN, am I right?


I don’t know IMD, let alone the “join range” function.
I don’t remember participating in the discussion you mention.
At first glance, the new package looks very attractive.
I will certainly try to verify if with this it is easier to solve the problem I posed.
In fact I haven’t solved it yet using julia (dataframes or other) even though I know how to do it, because I solved it in another context.
What I was looking for is to see how (in various ways) this problem can be faced which, even if particular, seems to me quite typical.
It would be interesting to see if there are strategies that are more “robust” or more “efficient” in some sense or simply easier to implement.

No it doesn’t fit to range join because that kind of joins accept one column from the left data set and at most two columns from the right data set, however, here we have two columns in the left data set and two columns in right data set.

For this problem, i guess the author needs to break each deal range to multiple parts and finds the status of product attr at each part, e.g. status 1 range 1.1.21--1.2.21 should be breaks into two segments 1.1.21--14.1.21 and 15.1.21--1.2.21 where for the first segment no price information exists so the result is missing but for the second one Price 1 is chosen. However, there are some complication which I don’t understand for 2.2.21--2.2.21 there are two possibilities 1 for status 1 and another one for status 2 (id 234), and only one of them appears in the output data set.

The status table should look like this, if I haven’t messed around turning it to string, as @nilshg explained to me

My interpretation is that the requirement is unambiguous.
In particular, for the change of state, status1 ends on 1/2/2021 and the next day (2/2/2021) begins status2.