In ReinforcementLearningAnIntroduction/notebooks/Chapter01_Tic_Tac_Toe.jl in the Testing section there is the comment “I leave it as an exercise to change the select_action to a greedy version and confirm that our trained policy reaches a tie everytime.” I’m sorry to admit that I’ve failed in this seemingly simple exercise. From what I understand, once the MultiAgentPolicy is trained, I would need to create a new version of select_action (let’s call it test_select_action) that uses a GreedyExplorer instead of EpsilonGreedyExplorer. That’s easy enough, but when I try to reassign the mapping to the new function, it fails because it can’t convert an object of type select_action to a type of test_select_action. So, obviously I’m doing it wrong, but I don’t see how to do it.
For context, I’ve built an environment to play Reversi (AKA Othello). To test the game, after training, I play it going first. According to wikipedia (and in my experience) for a 4 or 6 sided board, the player going 2nd should always win if playing well. In my tests, after a training run of 200_000 episodes, if the board is 4 sided my environment always wins going 2nd, but for a 6 sided board, it usually doesn’t. Since I’m currently still using the policy with the EpsilonGreedyExplorer, it could be that that is the problem or it could be that it just needs more training, but until I can figure out how to substitute the GreedyExplorer, I can’t make that determination.