I have a DataFrame with episodes of exposure of the form:
PersonId::Int64
DateOfBirth::Date
Sex::Int64
StartDate::Date
EndDate::Date
- I would like to split each record on January 1, of each calendar year between
:StartDate
and:EndDate
, e.g. where year start is>:StartDate && <= :EndDate
. An episode can span several years. - Then I would like to split each record on each annual birthday of the person, if the birthday is
>:StartDate && <= :EndDate
- When an episode is split, the end date of the left-hand portion will be split date - 1 day, and the start date of the right-hand portion will be equal to the split date. Note: This could result in a right-hand portion where start date equals end date, if the split date equals the end date - this is fine.
-
:PersonId, :DateOfBirth, :Sex
should be retained for all resulting records. - For those familiar with Stata, this is equivalent to the
stsplit
command.
I can think of a brute force method of iterating across all records, testing the year span of each episode and appending split records to an empty DataFrame, but I guess there might be a more elegant way of doing this using the amazing features of DataFrames.
Any help will be appreciated, thank you.