When customizing an environment using ReinforcementLearning.jl
, we need to transform the environment using some environment wrappers because a TabularQApproximator
only accepts states of type Int
, as the example code shows:
wrapped_env = ActionTransformedEnv(
StateTransformedEnv(
env;
state_mapping=s -> s ? 1 : 2,
state_space_mapping = _ -> Base.OneTo(2)
);
action_mapping = i -> action_space(env)[i],
action_space_mapping = _ -> Base.OneTo(3),
)
Then what’s the underlying logic of this code? Specifically, why does state_mapping=s -> s ? 1 : 2
while action_mapping = i -> action_space(env)[i]
, which means these two mappings function in opposite directions?