Hello,
For POMDPs.jl I have a big state space (bigger than 2.000 possible states). From certain states, I cannot take some actions, that is state-dependent actions.
I have check the documentation (link here). And for TabularTDLearning I guess I cannot use a function to determine all the possible actions for a state, so I gave a bad reward to impossible actions and the next state is the previous state, but as the state-space is quite big it doesn’t find the optimal policy.
Do you think it is the right way?