[ANN] Announcing AlphaZero.jl

jonathan-laurent · April 5, 2020, 8:12pm

I just had an interesting conversation on the Lc0 discord, which answered my question about distinguishing two different uses of the move selection temperature. For people following this thread, I am summarizing my conclusions here.

After running N MCTS iterations to plan a move, let’s write N_a the number of times action a is explored. We have N = \sum_a N_a.
The resulting game policy is to play action a with probability \pi_a := (N_a/N)^{1/\tau} with \tau the move selection parameter. However, the policy target that should be used to update the neural network is (N_a/N)_a and not \pi.
I think the AlphaGo Zero paper is misleading here as it uses notation \pi to denote both the policy to follow during self-play and the target update, suggesting these should be the same.

Also:

In Lc0, two temperature parameters are introduced. The first one is the move selection temperature, which corresponds to the \tau parameter described above. The second one (which does not appear in the AlphaGo Zero paper) is called the policy temperature and it is applied to the softmax output of the neural network to form the prior probabilities used by MCTS.
Typically, the policy temperature should be greather than 1 and the move selection temperature should be less than 1.

I am going to update AlphaZero.jl accordingly. I expect it should result in a significant improvement of the connect four agent.

Topic		Replies	Views
New Alpha zero replica Machine Learning announcement	6	1388	June 19, 2020
AlphaZero.jl throwing CUDNNError (code 8) GPU question	11	582	September 22, 2021
Learning testing in Julia by practice, propositions? General Usage	20	922	February 9, 2022
[ANN] Chess.jl - A Julia chess programming library Package Announcements package , announcement	6	1761	September 27, 2019
AlphaGPU: an alphazero implementation wholly on gpu GPU announcement	11	2284	May 28, 2021

[ANN] Announcing AlphaZero.jl

Related topics