I am excited to announce AlphaZero.jl: a generic, simple and fast implementation of Deepmind’s AlphaZero algorithm.
- The core algorithm is only 2,000 lines of pure, hackable Julia code.
- Generic interfaces make it easy to add support for new games or new learning frameworks.
- Being between one and two orders of magnitude faster than competing alternatives written in Python, this implementation enables you to solve nontrivial games on a standard desktop computer with a GPU.
Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind’s AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. I believe that this methodology can have exciting applications in many different research areas.
Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++) and optimized for highly distributed computing environments. This makes them hardly accessible for students, researchers and hackers.
The motivation for this project is to provide an implementation of AlphaZero that is simple enough to be widely accessible, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources.
I found the Julia language to be instrumental in achieving this goal.
To download AlphaZero.jl and start training a Connect Four agent, just run:
git clone https://github.com/jonathan-laurent/AlphaZero.jl.git cd AlphaZero.jl julia --project -e "import Pkg; Pkg.instantiate()" julia --project --color=yes scripts/alphazero.jl --game connect-four train
Each training iteration takes between 60 and 90 minutes on a desktop computer with an Intel Core i5 9600K processor and an 8GB Nvidia RTX 2070 GPU. I plot below the evolution of the win rate of AlphaZero against two baselines (a vanilla MCTS baseline and a minmax agent that plans at depth 5 using a handcrafted heuristic):
Note that the AlphaZero agent is not exposed to the baselines during training and learns purely from self-play, without any form of supervision or prior knowledge.
I also evaluate the performances of the neural network alone against the same baselines. Instead of plugging it into MCTS, the action that is assigned the highest prior probability is played at each state:
Unsurprisingly, the network alone is initially unable to win a single game. However, it ends up being competitive with the minmax agent despite not being able to perform any search.
For more information on training a Connect Four agent using AlphaZero.jl, see the full tutorial.
- Documentation Home
- An Introduction to AlphaZero
- Package Overview
- Connect-Four Tutorial
- Hyperparameters Documentation
Contributions to AlphaZero.jl are most welcome. Many contribution ideas are available in the contribution guide. Please do not hesitate to open a Github issue to share any idea, feedback or suggestion!