It seems as if the mask simply isn’t used. It also errors out on the first try, maybe i need some kind of initialization somewhere?
Because after the error occurs (i’m just including the source file at the REPL) a legal_action_space(env)
and legal_action_space_mask(env)
show the expected values. There has to be something special with regard to PPOPolicy that i don’t see.
Would be nice to see any example using PPOPolicy and an action mask