Using PPOPolicy with custom environment with action masking in ReinforcementLearning.jl

It seems as if the mask simply isn’t used. It also errors out on the first try, maybe i need some kind of initialization somewhere?

Because after the error occurs (i’m just including the source file at the REPL) a legal_action_space(env) and legal_action_space_mask(env) show the expected values. There has to be something special with regard to PPOPolicy that i don’t see.

Would be nice to see any example using PPOPolicy and an action mask :frowning: