Impossible actions in POMDPs.jl

Hello,

For POMDPs.jl I have a big state space (bigger than 2.000 possible states). From certain states, I cannot take some actions, that is state-dependent actions.

I have check the documentation (link here). And for TabularTDLearning I guess I cannot use a function to determine all the possible actions for a state, so I gave a bad reward to impossible actions and the next state is the previous state, but as the state-space is quite big it doesn’t find the optimal policy.

Do you think it is the right way?

Bump, and I will ping @zsunberg for you.

1 Like

Thanks :blush:

Current implementations in TabularTDLearning don’t support action masking and a PR would be very welcome. ReinforcementLearning.jl probably has better support there. POMDPs.jl (and other solvers in the ecosystem) do support state dependent legal actions; you just need to define actions(mdp, s) for your problem to define the legal action set for each state s.