What is the status of ReinforcementLearning.jl

Indeed, just by doing automated conversions, it should be possible to make the algorithms of DeepQLearning.jl and Crux.jl (both using the POMDPs interface) available to users working with the ReinforcementLearning.jl interface. Together, the former two packages contain most of the baseline algorithms (DQN with some extensions, REINFORCE, PPO, SAC etc.), but there are a few problems I encounter as a user, so here’s a wish list for Julia’s RL ecosystem:

(1) Unfortunately, GPU training support seems to be unmaintained and broken with both packages, so I’m only using CPU at the moment. It would be nice to if someone has the energy to fix this, either within the original repos or as a rebooted ReinforcementLearning.jl package.

(2) Neither packages have native support for action masking. I was able to hack in this feature using a custom neural network to set the output values for invalid action to -\infty, but built-in support would be convenient.

(3) The two packages depend on TensorBoardLogger.jl, which is very useful once it’s set up, but brings in Python dependencies and the complication of multi-language package management. It would be nice to have a pure-Julia RL package with Python dependencies as package extensions.

(4) More advanced algorithms (e.g. Rainbow DQN) are always welcome, of course.

2 Likes

Hello,

In my team, we are developing algorithms for RL problems with combinatorial action spaces (discrete but too large to be fully enumerated).
We currently have implemented our own custom interface and algorithms from scratch.
I was looking at how we could implement a cleaner and more generic version of our algorithms that is compatible with the Julia RL ecosystem.
From our external view, ReinforcementLearning.jl seemed abandoned, especially seeing the recent Crux.jl built upon POMDPs.jl interface instead.
That’s why we were thinking on also using POMDPs.jl interface.

From reading this thread it seems not that clear anymore, and I’m a bit lost in this ecosystem.

In any case, if there is a need for contributors, I would be happy to help.

4 Likes

I’m also interested in these efforts.

  • Since we have CommonRLInterface.jl, integration of Agents.jl with RL could probably be done to a large extent via this interface independently of the current status of RL.jl. Of course, while concrete examples will need (presumably multi-agent) RL algorithm implementations, I believe that any design work in this area would still be very worthwhile. Being able to do experiments with any Agents.jl model would yield interesting opportunities for research. I think figuring out how to do this cleanly is an interesting (software engineering) problem in its own right.
  • My understanding was that RL.jl v0.10.2 had a zoo with a lot of algorithms. However, I recently tried to get it up and running again and failed to do so. A lot of things have presumably changed regarding Flux.jl and various AD frameworks. On the other hand, current v0.11.x does not have those implementations yet. There is a lot of work to port old zoo algorithms to the new version, adding tests, and fixing any bugs.
4 Likes

Do you know which features of the POMDPs.jl interface they use which is not in CommonRLInterface.jl? Based on this discussion I think it would really be great to move everything shared between (PO)MDP solving and RL to CommonRLInterface.jl.

I think we should really get action masking and any kind of imaginable action space right in CommonRLSpace.jl (or CommonRLInterface.jl, potentially).

I’d love to see framework independent implementions of DeepRL methods. People should be able to use Flux.jl, Lux.jl or custom models and get GPU support through these frameworks or e.g. Reactant.jl.

I agree, logging should be much more flexible, e.g. with an extension for TensorBoardLogger.jl or Wandb.jl

State-dependent actions in POMDPs.jl are returned by the method

actions(m::Union{MDP,POMDP}, s)

where m is the model and s is the state. If you omit the second argument s, then the entire action space is returned. Presumably the latter method does not need to be implemented if this is not practical.

However, POMDPs.jl doesn’t directly support returning a binary mask array for actions. It’s easy to convert the state-dependent actions to a binary mask, with some overhead.

This is only about the interface. The actual RL packages (Crux and DeepQLearning) don’t support either state-dependent actions or masking. Probably there are non-RL POMDP solvers that exploit this interface.

1 Like