I am using the POMDP package along with the TabularTDLearning package
In the first part of the problem I worked out a 3D transition matrix T(s’, a, s) and a 2D reward matrix R(s, a). Then formulated it as a tabular MDP and solved it using the QLearningSolver function which is part of the TabularTDLearning package.
Next, suppose the R-matrix changes as R(s, a, s’), can the the TabularMDP function formulate it? I have my doubts because the documentation states that the R-matrix must be 2D.
How should I handle this kind of R-matrix? Please advice. Thank you.