Constructing Markov Transition Frequency Matrix with DataFrames

I am cross-posting this question from StackOverflow since that seems a bit slow.

I want to use Julia DataFrames to construct a 3x3 Markov transition matrix i.e. a frequency matrix that tells me the likelihood of transitioning from each of 3 states to the others. I am trying to learn data frames and I would like to learn the best way to do this. This is more for general learning than about this particular example.

Here’s some code I tried so far with some example data but I am not really familiar enough with how to think about dataframes to know how to proceed.

Any suggestions? Thank you.


state=[2,2,3,1,1,3,3,2,1,1,3,1,2,3,2,3,1,2,3,3,1]
statelag=[1,2,2,3,1,1,3,3,2,1,1,3,1,2,3,2,3,1,2,3,3]
df = DataFrame(state=state, statelag=statelag)

markov = combine(groupby(df, [:statelag, :state]), nrow => :cat_countmar)
sort!(markov, :statelag, :state) # this gives the number of occurences of each tranistion

total = combine(groupby(df, :statelag), nrow => :cat_count) 
# this gives the number of occurences of each state


trans = Array{Float64}(undef, (3,3))
# trans should give probability of transitioning between different states

I need to basically “divide” catcountmar of by cat_count so that I’m dividing the number of occurrences of a transition from state i to state j by the number of occurences of state i. This will give the desired transition frequency. But I don’t see how to put markov and total together in one data frame and easily carry out this computation.

Hi all!

Is any package to make Markov transition matrix from the data?

What does your data look like?

Something like this:

Subj,Period,Group
1, V1, G1
1, V2, G3
2, V1, G2
2, V2, G3
2, V3, G4
3, V1, G3
3, V2, G1
3, V3, G1
4, V1, G1
4, V2, G1
4, V3, G3
4, V4, G2



So each subject independently follows a trajectory over the groups, and you want to learn the Markov chain governing it?

Yes, at first I want to know what is overall probabilities to change group, then it good to know is that probabilities is period-dependent or is previous state inference to next transition.

Well for the simplest problem, if you forget about the dataframes and just encode your trajectories as vectors of integers, here’s what it could look like:

function estimate_transitions(trajectories::Vector{Vector{Int}}, N)
    A = zeros(N, N)
    # count transitions
    for traj in trajectories
        for t in 1:length(traj)-1
            A[traj[t], traj[t+1]] += 1.0
        end
    end
    # normalize rows
    @views for i in 1:N
        A[i, :] ./= sum(A[i, :])
    end
    return A
end

Demo with your data:

julia> trajectories = [
           [1, 3],
           [2, 3, 4],
           [3, 1, 1],
           [1, 1, 3, 2]
       ];

julia> estimate_transitions(trajectories, 4)
4×4 Matrix{Float64}:
   0.5         0.0         0.5    0.0
   0.0         0.0         1.0    0.0
   0.333333    0.333333    0.0    0.333333
 NaN         NaN         NaN    NaN
1 Like

Thank you very much, I will try this code.