Using PPOPolicy with custom environment with action masking in ReinforcementLearning.jl

Hi @mfg ,

Could you share the code of initializing the Trajectory part?

It should be something like:

julia> trajectory = MaskedPPOTrajectory(;
                   capacity = UPDATE_FREQ,
                   state = Matrix{Float32} => (ns, N_ENV),
                   action = Vector{Int} => (N_ENV,),
                   legal_actions_mask = Vector{Bool} => (na, N_ENV),
                   action_log_prob = Vector{Float32} => (N_ENV,),
                   reward = Vector{Float32} => (N_ENV,),
                   terminal = Vector{Bool} => (N_ENV,),
               )