How to find frequency count for consecutive actions?

I have a few sizable data sets with rows contains actions of a person in different days. I want to summarize the data as a frequency table that shows the number of times that a specific action followed by another action. for example in the following data set ‘A’ followed by ‘A’ 3 times, etc.

``````data=Dataset(id=[1,2,3,4],day1=['A','B','A','A'],day2=['C','A','D','A'],day3=[missing,'A','A','A'])
expected=Dataset(action1=['A','C','B','A','A','D'],action2=['C',missing,'A','A','D','A'],count=[1,1,1,3,1,1])
``````

I prefer `InMemoryDatasets` solution but open to answers using `DataFrames`.

I also tried a version with IMD even if I’m not familiar with the various possibilities and maybe it can be improved

``````
df=DataFrame(data)

dfres=DataFrame(vcat([df.day1 df.day2], [df.day2 df.day3]),[:act1, :act2])

DataFrames.combine(DataFrames.groupby(dfres, [:act1, :act2]), nrow)
``````
``````
dfres=Dataset(vcat([df.day1 df.day2], [df.day2 df.day3]),[:act1, :act2])

InMemoryDatasets.combine(InMemoryDatasets.groupby(dfres, [:act1, :act2]),:act1=>length)

``````
``````
function countActChg(x,y,z)
d=countmap([Pair.(x,y);Pair.(y,z)])
(k=collect(keys(d)),v=collect(values(d)))
end

DataFrames.combine(df, 2:4 =>countActChg=>AsTable)

ulia> DataFrames.combine(df, 2:4 =>countActChg=>AsTable)
6×2 DataFrame
Row │ k             v
│ Pair…         Int64
─────┼─────────────────────
1 │ 'A'=>'C'          1
2 │ 'A'=>'D'          1
3 │ 'C'=>missing      1
4 │ 'B'=>'A'          1
5 │ 'A'=>'A'          3
6 │ 'D'=>'A'          1
``````
``````DataFrames.combine(df, 2:4 =>countAC=>AsTable)

function countAC(x,y,z)
d=countmap([tuple.(x,y);tuple.(y,z)])
(k=collect(keys(d)),v=collect(values(d)))
end
``````

why in this solution it is necessary to use the collect function (without it does not work)?
is there a more synthetic way to simulate the same transformation?

what happens if there are more columns in data set, your solution supposes only 3 days are in data set?

this is the first way that comes to mind, but I think it can also be done through reduce or with a recursive function

``````df=DataFrame(id=[1,2,3,4],day1=['A','B','A','A'],day2=['C','A','D','A'],day3=[missing,'A','A','A'],day4=['A','B','A',missing],day5=['C',missing,'D','A'])

reshape(Matrix(df[:,2:end]),:,2)
``````

or

``````using IterTools
vcat([[f l] for (f,l) in partition(eachcol(df[:,r"day"]),2,1)]...)
``````
``````
cols=eachcol(df)
n=ncol(df)
reduce((s,c)->vcat(s,[cols[c-1] cols[c]]), 4:n,init=[cols[2] cols[3]])
``````
``````
reduce((s,c)->vcat(s,[s[end-3:end,2] c]), eachcol(df)[4:end];init=[df[:,2] df[:,3]])

``````

Do like this:

-transpose each row

-add lag of days for each id

-remove the first obs for each id

-count each action

``````julia> using Chain
julia> function f(x)
result=ones(Bool,length(x))
result[1]=false
result
end
julia> @chain data begin
groupby(:id)
transpose(r"day")
groupby(:id)
modify!(:_c1=>lag=>:lagd, :_c1=>f=>:temp)
filter(:temp)
groupby([:_c1, :lagd])
combine(:_c1=>length)
end
``````
2 Likes
``````result[end]=false

modify!(:_c1=>lead=>:lagd, :_c1=>f=>:temp)

``````

shoul give the rigth order

use

``````@chain data begin
gatherby(:id, eachrow=true)
transpose((2:ncol(data)-1, 3:ncol(data)))
gatherby(r"_c")
combine(1=>length)
end
``````
3 Likes

thanks, great!

great trick!