Announcing AlphaZero.jl

I am excited to announce AlphaZero.jl: a generic, simple and fast implementation of Deepmind’s AlphaZero algorithm.

  • The core algorithm is only 2,000 lines of pure, hackable Julia code.
  • Generic interfaces make it easy to add support for new games or new learning frameworks.
  • Being between one and two orders of magnitude faster than competing alternatives written in Python, this implementation enables you to solve nontrivial games on a standard desktop computer with a GPU.

Why should I care about AlphaZero?

Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind’s AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. I believe that this methodology can have exciting applications in many different research areas.

Why should I care about this implementation?

Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++) and optimized for highly distributed computing environments. This makes them hardly accessible for students, researchers and hackers.

The motivation for this project is to provide an implementation of AlphaZero that is simple enough to be widely accessible, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources.

I found the Julia language to be instrumental in achieving this goal.

Training a Connect Four Agent

To download AlphaZero.jl and start training a Connect Four agent, just run:

git clone https://github.com/jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --project -e "import Pkg; Pkg.instantiate()"
julia --project --color=yes scripts/alphazero.jl --game connect-four train

Each training iteration takes between 60 and 90 minutes on a desktop computer with an Intel Core i5 9600K processor and an 8GB Nvidia RTX 2070 GPU. I plot below the evolution of the win rate of AlphaZero against two baselines (a vanilla MCTS baseline and a minmax agent that plans at depth 5 using a handcrafted heuristic):

Note that the AlphaZero agent is not exposed to the baselines during training and learns purely from self-play, without any form of supervision or prior knowledge.

I also evaluate the performances of the neural network alone against the same baselines. Instead of plugging it into MCTS, the action that is assigned the highest prior probability is played at each state:

Unsurprisingly, the network alone is initially unable to win a single game. However, it ends up being competitive with the minmax agent despite not being able to perform any search.

For more information on training a Connect Four agent using AlphaZero.jl, see the full tutorial.

Resources

Contributing

Contributions to AlphaZero.jl are most welcome. Many contribution ideas are available in the contribution guide. Please do not hesitate to open a Github issue to share any idea, feedback or suggestion!

97 Likes

As one of the contributors to Leela Chess Zero (alphazero for chess open source implementation) I want to say congrats and keep up the good work! Is the MCTS here batched and multi-threaded? Also, how many nodes per second (with what hardware and what size network) are you getting?

1 Like

Thanks for the encouragements!
As explained here, our MCTS implementation is batched but not multi-threaded. I think this is a good trade-off between implementation simplicity and speed (batching alone delivers a 20x speedup on our connect four experiment).

Looking at the screenshot on Github’s README, during the first iteration of the connect four experiment, the system makes 38 moves per second, using 600 MCTS simulations per move. This corresponds to expanding about 38x600=23K nodes per second. We use a tiny neural network of about 600K parameters. However, the GPU is currently underutilized so I would expect these numbers to scale pretty well with larger networks.

1 Like

From reading through some of the source and documentation, my first comment would be that you can get a basically free 2x speedup by using fp16 instead of fp32 (on rtx gpus at least). Lc0 hasn’t had success with Int8, but fp16 should basically be free. For a reference on what should be possible in terms of speed, https://docs.google.com/spreadsheets/d/1lGFf6PLGmBUSMan-YP7Vul4DpRNfn6K8oeCjBILe6uA/edit#gid=1508569046 has good references for what lc0 gets on various hardware. With an RTX 2060, 20000 mcts nodes per second should be possible on a 20x256 network. (Note Lc0 has hand-written cuda, so Julia probably won’t be able to match this).

2 Likes

Thanks for these references.
The numbers you give me are consistent with the computation I made in my previous answer (which I edited).

You are perfectly correct with using fp16 numbers to speed up inference. In fact, as mentioned in the contributions guide, I would also like to add support for Int8 quantization, as some people at Oracle did.

I should have a conversation with @maleadt about how well CuArrays would support this.

@Oscar_Smith I am curious: could you elaborate a bit on the Int8 experiments that were made with Lc0 and why they were not successful?

We haven’t tried in about a year, but when we did, the quantized int8 networks were several hundred elo worse

2 Likes

Interesting. I am working with DRL myself and I thought that it would be nice to have a unified framework in Julia for RL. Maybe this is a step towards it.

Thanks for your interest! Regarding a unified RL framework for Julia, you may be interested in the excellent ReinforcementLearning.jl framework that is currently being developed by @findmyway, @jbrea and others.

In fact, @findmyway and I are currently discussing integrating AlphaZero.jl into this framework. Please feel free to join the conversation in the corresponding Github issue.

More generally, I would love to hear your thoughts on what you would expect from a unified RL framework and how we can best move towards it!

1 Like

This looks really cool. I’m curious if RAM limitations become an issue? Like does this work well with 16GB of RAM or would I need something larger in the 32GB to 64GB range…or higher? I’d also assume some of the hyperparameters end up drastically changing the memory requirements?

When it finishes a run does it save it’s state so that you could play it later? Or possibly have two different runs play each other? Or am I just being silly?

Using this on a new game does it expect each player to alternate turns or does it have the concept of player X gets to make another move. Something like checkers where if you jump you can jump another…or something like the dots game where if you complete a square you draw another line? Or do the “available” moves the game presents need to include all the “go again” combinations as a single move?

Last question would be does the number of available moves drastically effect the memory usage…I’m assuming it could drastically increase the computations.

Sorry I look at this and wonder how I can use it, rather than training an AI just to train an AI. :slight_smile:

1 Like

I think this is very cool. I have spent more than 1 hours just reading the source code, I think I have learnt a lot :-).

Also, I teach about that topic at my university (in an AI course, in which there is a small part about games theory: minmax, …), and maybe I will show it to my students. Thanks again!

1 Like

Many great questions here!

Actually, to train the Connect Four agent in the tutorial with the current choice of hyperparameters, even 4GB of RAM should be enough. The main parameter that determines RAM memory usage is how often you reset the MCTS tree (see reset_mcts_every), and this parameter is typically set to a fairly small value.

Indeed, the current user interface saves the training environment automatically after each training iteration. You can interrupt training at any time and resume it later, after playing a few games against your agent and examining the MCTS statistics. More generally, AlphaZero.jl comes with batteries included and features tools and utilities to get you started quickly. For more details, I encourage you to have a look at the tutorial.

This feature is not exposed in the console interface (yet) but it should only be a few lines of code using the API.

There is no built-in assumption that players have to alternate turns. When implementing the play! function in the Game Interface, turns switches must be declared explicitly. More generally, when adding support for a game, there are often many possible ways to define what an action and a turn are, as you perfectly illustrated with the checkers examples.

The average number of available moves has an effect on the memory consumption of MCTS and on the memory footprint of the experience buffer indeed. However, I would not expect RAM to be a limiting factor when using AlphaZero.jl. MCTS should not be a problem if you reset the tree often enough. Regarding the experience buffer, we might need to add an option to store it on disk at some point, which should not affect performances.

If you want to try AlphaZero.jl on a game with a very high branching factor and on a single machine, I would honestly worry more about lacking resources to do enough exploration than I would worry about RAM.

My main goal with AlphaZero.jl is to offer a valuable resource for students and researchers. The AlphaZero algorithm is extremely general and I think it can find applications in many research domains (including automated theorem proving, which is my own research area). I have been surprised to see that, despite the general excitement around AlphaZero, very few people actually tried to build on it. One explanation, I think, is the lack of accessible open-source implementations. I am trying to bridge this gap with AlphaZero.jl.

13 Likes

@dmolina I am delighted to hear this. Making AlphaZero.jl a valuable pedagogical resource has been one of my main goals and I am looking forward to seeing people use it to teach AI.

Please keep me updated and tell me if there is anything I can do to help making it fit your purposes even better.

2 Likes

Ram issues are actually a pretty major deal for long searches on fast hardware. Lc0 has done a lot of work to get it’s nodes down to 80 bytes, and people still will fill up 60gigs of RAM fir multi hour searches.

Thanks. This is great, I’m excited to start playing with this.

Interesting! I am curious: how frequently are you resetting the MCTS tree during self-play and did you make any study on how it impacts learning? I have never found good information on this.

To be clear, my point was not to say that RAM cannot be an issue when training an AlphaZero agent. I would just expect that, for a single individual running a self-contained experiment on their own machine, lack of computing power will probably become a limitation before lack of RAM memory does. I might be wrong on this, though.

Anyway, I am always happy to discuss memory optimizations and I thank you for sharing all these insights from Lc0. :slight_smile:

During learning, we use Kullback–Leibler divergence between the tree 100 nodes ago and the tree now to have less nodes in easy positions and more in harder positions with an average of around 800 nodes per move. The biggest ram limitation when training is VRAM for large (256x16 or bigger) networks. Lc0 solves the computing power issue by having game generation distributed among a community of volunteers. The biggest memory optimizations are storing moves rather than boards in nodes, and making storing policies in 16 bits rather than 32. We also have both nodes and edges, since it allows you to not store a lot of information for leafs (which are the majority of nodes).

1 Like

I think a blog post would be very popular

2 Likes

@Ratingulate Thanks! Do you have a particular platform in mind?