Tabular MDP/Q-learning solver: Reward Matrix R(s, a, s') vs. R(s, a)

Hello Folks,

I am using the POMDP package along with the TabularTDLearning package

In the first part of the problem I worked out a 3D transition matrix T(s’, a, s) and a 2D reward matrix R(s, a). Then formulated it as a tabular MDP and solved it using the QLearningSolver function which is part of the TabularTDLearning package.

Next, suppose the R-matrix changes as R(s, a, s’), can the the TabularMDP function formulate it? I have my doubts because the documentation states that the R-matrix must be 2D.

How should I handle this kind of R-matrix? Please advice. Thank you.