I am a beginner of both Julia and RL. Now I’m trying to training a Q-based tabular policy with off-policy strategy using ReinforcementLearning.jl
. Let behavior_policy
be a RandomPolicy
(for example) to collect experiences, I want the target_policy
as a QBasedPolicy
to be trained. I wrote the following code:
target_policy = QBasedPolicy(
learner = MonteCarloLearner(;
approximator=TabularQApproximator(
;n_state = length(state_space(wrapped_env)),
n_action = length(action_space(wrapped_env)),
opt = InvDecay(1)
)
),
explorer = EpsilonGreedyExplorer(0)
)
behavior_policy = RandomPolicy(action_space(wrapped_env))
p = OffPolicy(target_policy, behavior_policy)
agent = Agent(
policy = p,
trajectory = VectorSARTTrajectory()
)
hook = TotalRewardPerEpisode()
run(
agent,
wrapped_env,
StopAfterEpisode(100),
hook
)
But it didn’t work as the p.π_target.learner.approximator.table
still being all zero.
Is there anyone who can tell me where the problem lies? Do I take wrong understandings on off-policy reinforcement learning, or not properly use the OffPolicy
method?
Thanks!