I am not aware of any existing Julia repo but here are some thoughts.
I think that learning a good sokoban player is a pretty hard problem given the current state of the art in RL. I would not expect an approach based purely on learning (DQN, policy gradient) to be successful unless you throw some Deepmind’s level of computing power at it.
Instead, I would really bet on a combination of learning and search, as done in AlphaZero. (Note that I may be a bit biased here as I just released the first version of AlphaZero.jl).
With a few modifications, I think AlphaZero could work pretty well. However:
- Sokoban is a one player game and you cannot rely on self-play as easily as in Connect Four, Chess or Go. However, I would expect curriculum learning to work pretty well in the case of Sokoban (in particular, see the section on assymetric self-play).
- To make search more efficient, you might want to expose more high-level actions than just {Left, Right, Up, Down}. For example, it might be interesting to have a macro action such as “go there”, which is implemented using a hard-coded path finding algorithm.
Another big challenge in Sokoban is that some moves are irreversible and the game is never-ending from an impossible position. Therefore, you may want the agent to “learn when to give up” (or just put a limit on the length of episodes).