Getting started with ReinforcementLearning.jl

In ReinforcementLearningAnIntroduction/notebooks/Chapter01_Tic_Tac_Toe.jl in the Testing section there is the comment “I leave it as an exercise to change the select_action to a greedy version and confirm that our trained policy reaches a tie everytime.” I’m sorry to admit that I’ve failed in this seemingly simple exercise. From what I understand, once the MultiAgentPolicy is trained, I would need to create a new version of select_action (let’s call it test_select_action) that uses a GreedyExplorer instead of EpsilonGreedyExplorer. That’s easy enough, but when I try to reassign the mapping to the new function, it fails because it can’t convert an object of type select_action to a type of test_select_action. So, obviously I’m doing it wrong, but I don’t see how to do it.

For context, I’ve built an environment to play Reversi (AKA Othello). To test the game, after training, I play it going first. According to wikipedia (and in my experience) for a 4 or 6 sided board, the player going 2nd should always win if playing well. In my tests, after a training run of 200_000 episodes, if the board is 4 sided my environment always wins going 2nd, but for a 6 sided board, it usually doesn’t. Since I’m currently still using the policy with the EpsilonGreedyExplorer, it could be that that is the problem or it could be that it just needs more training, but until I can figure out how to substitute the GreedyExplorer, I can’t make that determination.

1 Like

You could maybe just change the epsilon in the explorer to zero, though maybe not a proper way of doing it.

Otherwise you could just recreate the structure for testing (I assume the same structure as was presented in Chapter1) with something like this, haven’t tested so might need some change

test_policies = MultiAgentManager(
		NamedPolicy( => VBasedPolicy(;
				mapping = test_select_action
		for (x, p) in policies.agents