When customizing an environment using
ReinforcementLearning.jl, we need to transform the environment using some environment wrappers because a
TabularQApproximator only accepts states of type
Int, as the example code shows:
wrapped_env = ActionTransformedEnv(
state_mapping=s -> s ? 1 : 2,
state_space_mapping = _ -> Base.OneTo(2)
action_mapping = i -> action_space(env)[i],
action_space_mapping = _ -> Base.OneTo(3),
Then what’s the underlying logic of this code? Specifically, why does
state_mapping=s -> s ? 1 : 2 while
action_mapping = i -> action_space(env)[i], which means these two mappings function in opposite directions?
Haven’t checked the example you are talking about, but the it seems like
env here is an environment that represent its state as a boolean and its action as one out of three values (I guess it might be (-1, 0, 1), but haven’t check the example so not sure).
To make this into indexes for the Q-table so the agent can easily handle this, we want to convert them to be in the range 1-2 and 1-3.
So the transformations simply acts as a layer to easily define the layer that translates between env and agent representations. When the agent reads a state it is transformed from bool to 1-2, and when it supplies an action as 1-3 it is transformed to whatever the action space is for the env.