How to find frequency count for consecutive actions?

I have a few sizable data sets with rows contains actions of a person in different days. I want to summarize the data as a frequency table that shows the number of times that a specific action followed by another action. for example in the following data set ‘A’ followed by ‘A’ 3 times, etc.

data=Dataset(id=[1,2,3,4],day1=['A','B','A','A'],day2=['C','A','D','A'],day3=[missing,'A','A','A'])
expected=Dataset(action1=['A','C','B','A','A','D'],action2=['C',missing,'A','A','D','A'],count=[1,1,1,3,1,1])

I prefer InMemoryDatasets solution but open to answers using DataFrames.

I also tried a version with IMD even if I’m not familiar with the various possibilities and maybe it can be improved


df=DataFrame(data)

dfres=DataFrame(vcat([df.day1 df.day2], [df.day2 df.day3]),[:act1, :act2])

DataFrames.combine(DataFrames.groupby(dfres, [:act1, :act2]), nrow)

dfres=Dataset(vcat([df.day1 df.day2], [df.day2 df.day3]),[:act1, :act2])

InMemoryDatasets.combine(InMemoryDatasets.groupby(dfres, [:act1, :act2]),:act1=>length)



function countActChg(x,y,z)
d=countmap([Pair.(x,y);Pair.(y,z)])
(k=collect(keys(d)),v=collect(values(d)))
end

DataFrames.combine(df, 2:4 =>countActChg=>AsTable)

ulia> DataFrames.combine(df, 2:4 =>countActChg=>AsTable)
6×2 DataFrame
 Row │ k             v     
     │ Pair…         Int64
─────┼─────────────────────
   1 │ 'A'=>'C'          1
   2 │ 'A'=>'D'          1
   3 │ 'C'=>missing      1
   4 │ 'B'=>'A'          1
   5 │ 'A'=>'A'          3
   6 │ 'D'=>'A'          1
DataFrames.combine(df, 2:4 =>countAC=>AsTable)

function countAC(x,y,z)
d=countmap([tuple.(x,y);tuple.(y,z)])
(k=collect(keys(d)),v=collect(values(d)))
end

why in this solution it is necessary to use the collect function (without it does not work)?
is there a more synthetic way to simulate the same transformation?

what happens if there are more columns in data set, your solution supposes only 3 days are in data set?

this is the first way that comes to mind, but I think it can also be done through reduce or with a recursive function

df=DataFrame(id=[1,2,3,4],day1=['A','B','A','A'],day2=['C','A','D','A'],day3=[missing,'A','A','A'],day4=['A','B','A',missing],day5=['C',missing,'D','A'])


 reshape(Matrix(df[:,2:end]),:,2) 

or

using IterTools
vcat([[f l] for (f,l) in partition(eachcol(df[:,r"day"]),2,1)]...)

cols=eachcol(df)
n=ncol(df)
reduce((s,c)->vcat(s,[cols[c-1] cols[c]]), 4:n,init=[cols[2] cols[3]])

reduce((s,c)->vcat(s,[s[end-3:end,2] c]), eachcol(df)[4:end];init=[df[:,2] df[:,3]])

Do like this:

-transpose each row

-add lag of days for each id

-remove the first obs for each id

-count each action

julia> using Chain
julia> function f(x)
            result=ones(Bool,length(x))
            result[1]=false
            result
         end
julia> @chain data begin
           groupby(:id)
           transpose(r"day")
           groupby(:id)
           modify!(:_c1=>lag=>:lagd, :_c1=>f=>:temp)
           filter(:temp)
           groupby([:_c1, :lagd])
           combine(:_c1=>length)
       end
2 Likes
result[end]=false

modify!(:_c1=>lead=>:lagd, :_c1=>f=>:temp)

shoul give the rigth order

use

@chain data begin
     gatherby(:id, eachrow=true)
     transpose((2:ncol(data)-1, 3:ncol(data)))
     gatherby(r"_c")
     combine(1=>length)
end
3 Likes

thanks, great!

great trick!