I have a few sizable data sets with rows contains actions of a person in different days. I want to summarize the data as a frequency table that shows the number of times that a specific action followed by another action. for example in the following data set ‘A’ followed by ‘A’ 3 times, etc.
data=Dataset(id=[1,2,3,4],day1=['A','B','A','A'],day2=['C','A','D','A'],day3=[missing,'A','A','A'])
expected=Dataset(action1=['A','C','B','A','A','D'],action2=['C',missing,'A','A','D','A'],count=[1,1,1,3,1,1])
I prefer InMemoryDatasets
solution but open to answers using DataFrames
.