Hello community,
I have been trying to implement a basic Q-learning algorithm following the tutorial of the ReinforcementLearning.jl. The main issue comes when, instead of giving the agent a RandomPolicy() as policy, I set them with the QBasedPolicy developed in the tutorial. It gives me the following error:
BoundsError: attempt to access 2×7 Matrix{Float64} at index [0, 1]
For reference, my code is as follows:
using Revise
using ReinforcementLearning
using Flux
using ReinforcementLearningTrajectories
#create enviroment
env = RandomWalk1D()
state(env)
#create policy
S, A = state_space(env), action_space(env)
NS, NA = length(S), length(A)
#basic policy learner (with epsilon greedy explorer). Implementation of q learner
policy = QBasedPolicy(
learner = TDLearner(
TabularQApproximator(
n_state = NS,
n_action = NA,
),
:SARS
),
explorer = EpsilonGreedyExplorer(0.1)
)
#define a trajectory
trajectory = Trajectory(
ElasticArraySARTSTraces(;
state = Int64 => (),
action = Int64 => (),
reward = Float64 => (),
terminal = Bool => (),
),
DummySampler(),
InsertSampleRatioController(),
)
agent = Agent(
policy = policy,
trajectory = trajectory
)
run(agent, env, StopAfterNEpisodes(10), TotalRewardPerEpisode())