When customizing an environment using ReinforcementLearning.jl, we need to transform the environment using some environment wrappers because a TabularQApproximator only accepts states of type Int, as the example code shows:
wrapped_env = ActionTransformedEnv(
StateTransformedEnv(
env;
state_mapping=s -> s ? 1 : 2,
state_space_mapping = _ -> Base.OneTo(2)
);
action_mapping = i -> action_space(env)[i],
action_space_mapping = _ -> Base.OneTo(3),
)
Then what’s the underlying logic of this code? Specifically, why does state_mapping=s -> s ? 1 : 2 while action_mapping = i -> action_space(env)[i], which means these two mappings function in opposite directions?