Getting started with ReinforcementLearning.jl

bwanab · September 9, 2022, 6:46pm

In ReinforcementLearningAnIntroduction/notebooks/Chapter01_Tic_Tac_Toe.jl in the Testing section there is the comment “I leave it as an exercise to change the select_action to a greedy version and confirm that our trained policy reaches a tie everytime.” I’m sorry to admit that I’ve failed in this seemingly simple exercise. From what I understand, once the MultiAgentPolicy is trained, I would need to create a new version of select_action (let’s call it test_select_action) that uses a GreedyExplorer instead of EpsilonGreedyExplorer. That’s easy enough, but when I try to reassign the mapping to the new function, it fails because it can’t convert an object of type select_action to a type of test_select_action. So, obviously I’m doing it wrong, but I don’t see how to do it.

For context, I’ve built an environment to play Reversi (AKA Othello). To test the game, after training, I play it going first. According to wikipedia (and in my experience) for a 4 or 6 sided board, the player going 2nd should always win if playing well. In my tests, after a training run of 200_000 episodes, if the board is 4 sided my environment always wins going 2nd, but for a 6 sided board, it usually doesn’t. Since I’m currently still using the policy with the EpsilonGreedyExplorer, it could be that that is the problem or it could be that it just needs more training, but until I can figure out how to substitute the GreedyExplorer, I can’t make that determination.

albheim · September 10, 2022, 7:54am

You could maybe just change the epsilon in the explorer to zero, though maybe not a proper way of doing it.

Otherwise you could just recreate the structure for testing (I assume the same structure as was presented in Chapter1) with something like this, haven’t tested so might need some change

test_policies = MultiAgentManager(
	(
		NamedPolicy(
			p.name => VBasedPolicy(;
				learner=p.policy.learner,
				mapping = test_select_action
			)
		)
		for (x, p) in policies.agents
	)...
)

Topic		Replies	Views
Understanding MultiAgent in ReinforcementLearning.jl Machine Learning	2	919	July 27, 2021
Bounds Error when following tutorial of ReinforcementLearning.jl Machine Learning question	0	81	May 8, 2024
Using PPOPolicy with custom environment with action masking in ReinforcementLearning.jl Machine Learning question	14	1237	October 16, 2021
Error with CircularBufferArrays in ReinforcementLearning.jl Machine Learning question , package , error	5	738	July 11, 2021
How to perform an Off-Policy reinforcement learning using `ReinforcementLearning.jl`? Machine Learning question , package , machine-learning	0	262	June 6, 2023

Getting started with ReinforcementLearning.jl

Related topics