I am pretty proud to announce that SeaPearl, the package Félix Chalumeau and I have been working for the past year, is finally out.
It intends to be a whole Constraint Programming solver, but that can use a Reinforcement Learning agent as a value-selection heuristic to choose in a smarter way which value to branch on.
It comes with a few generators that allow the agent to learn on randomly generated instances of the graph-coloring problem, the knapsack and the TSPTW; and with a set of examples that can be found on a separate repository.
Whereas the paper accompanying the launch has shown a first proof of concept on previously done experiments, it should be seen as a great sandbox tool for other researchers to explore this (in our opinion) promising path of RL&CP combination.
The project, endorsed by Polytechnique Montréal through Quentin Cappart and Louis-Martin Rousseau, has welcomed new team members who are Tom Marty, Kim Rioux-Paradis, Tom Sander and Pierre Tessier. They are already improving the existing tool and will try out new exciting experiments with it.
Feel free to try it or come back to us if you have any question! One of our next step is to make it easier to use, for instance by adding a JuMP support.