I would like to ask, what is the status of ReinforcementLearning.jl? Specifically, if there is any active community around and it is therefore worth to learn it, use it, and fix the bugs.
To an outside observer, it seems like a big project, but there is not many activity around? Is there any alternative?
Read the online documentation! Most likely the answer is already provided in an example or in the API documents. Search using the search bar in the upper left.
Chat with us in Julia Slack in the #reinforcement-learnin channel.
Post a question in the Julia discourse forum in the category “Machine Learning” and use “reinforcement-learning” as a tag.
This repo is currently more or less stale due to the original creator having left the project mid-refactor. The core of the package was refactored, but the algorithms were not.
I was actively contributing to RL.jl with @jeremiahpslewis a bit more than one year ago. We had the intention to complete the refactor along with continuing the improvement of the package. RL.jl is a fairly ambitious project in my opinion. It has a lot of potential to be the library for easily designing new algorithms. However, the amount of work required to bring it back to a feature complete state is daunting for two persons. On my end, I am currently finishing the writing of my PhD thesis and haven’t had time (nor need) for RL.jl.
So yes, the project is big but doesn’t have enough contributors (with time). If you want to contribute, we’ll happily help. If you prefer an alternative, POMDPs.jl can do some RL and is more stable and active. It depends on your needs.
Thanks for the answer and congratulation for finishing your PhD. This is great achievement.
I understand the size of the project. What I am worried a bit is that even the new api does not suit our intended application. As I have written on slack, we need a possibility to work with environments, which have infite action space, though in every state, the action space is finite. This naturally occurs when the state is defined as a graph, where some actions are available on each vertex every vertex (here is a link to a paper of our student who does that [2009.12462] Symbolic Relational Deep Reinforcement Learning based on Graph Neural Networks and Autoregressive Policy Decomposition). It seems to me that with the current api, this is not practical, since legal_action_space should provide list of all actions and legal_action_space_mask returns a mask indicating which action is correct.
I have worked with continuous and infinite action spaces. The api you mention is restrictive and ought to be reworked in my opinion. The thing is, if you use an algorithm that does not call legal_action_space(_mask), then simply don’t define it for your environments. You’ll have an error when attempting to “show” the environment though, so you can instead define the action space with a dummy space but never use it.
Me and @Tortar are interested in integrating Agents.jl with this library for a GSoC project. We wanted to know if the maintainers are still active. If we were to make pull requests would there be someone able to review and accept them ?
@bergio13@Tortar I’d be happy to review and merge pull requests! It’s a pity ReinforcementLearning.jl is not in a better shape, but I have very limited time for active development.
I’ve been playing with reinforcement learning in Julia for the past few months. I’ve been using Crux.jl implementation of the proximal policy optimization (PPO) algorithm. The current master branch of Crux.jl (but not the latest release version in the general registry) works fine on my laptop as well as a remote server. You can try the classic CartPole training example from Gymnasium with the following example code: https://github.com/sisl/Crux.jl/blob/master/examples/rl/cartpole.jl
if you also install the other dependencies (Flux, POMDPs, POMDPGym).
Please let me know if either of the above options works! I didn’t write any of the packages but submitted minor issues and PRs to fix broken dependencies recently.
Unfortunately, for me I do not have a clear plan right now.
Thanks for @jbrea pinging me here. I’m not aware of this thread earlier.
I shifted to the LLM field about two years ago. This is quite a competitive field and trying to keep up with the latest progress almost eats up all my spare time. That is to say, I haven’t actively maintained RL.jl for a long time. Recently I was actually switching jobs, and it just got finalized this week. So I’d take this chance to share some reflections. Hopefully, they may be useful for future maintainers.
Obviously, the development of RL.jl was slowing down. But is it a specific phenomenon?
Actually, no. If you look into the commit history of some popular RL packages in the python world developed around the same time of RL.jl. You’ll find that they share something similar.
I think we have done two or three large refactors to RL.jl in the past. The first time is from tabular RL → deep RL. The second time is adding support for more diverse RL algorithms. The most important lesson I learned (only later on) is that, developing and maintaining a research oriented project is quite different from an industry one. With limited resource, we’d better focus on a narrow scope and make sure they are useful. Generalization is the last thing to consider.
What have we done right?
Separation of different components. (Although some may not agree)
New algorithms may come with quite different training recipes or network architectures. Still we can separate out some common components like Experience Replay Buffer or Environment interfaces.
The next step of RL.jl?
Thanks to @HenriDeh and @jeremiahpslewis 's great work in the past. I think the code structure is quite light now. But people want to integrate it with some other packages, there’s still a lot of work to do and I’m always willing to help.
The future of RL in the Julia world?
Personally, I really wish to have a package like trl or verl implemented in the Julia world. However, I’m quite pessimistic about it. But as it is said, the best way to predict the future is to create it. So I think I’ll still give it a try in the near future.
I agree with all of the above. A few additional points:
RL.jl has recently undergone a significant refactoring—particularly in its types, methods, and subpackages, including the use of ReinforcementLearningTrajectories.jl. It’s now quite performant (though some type stability issues remain). However, it still follows a “batteries not included” philosophy: many algorithm implementations and documentation are currently missing. A key issue in the past was the lack of unit tests for provided algorithms, making them difficult to maintain, especially for non-experts.
POMDPs.jl is a well-structured ecosystem (though I have limited experience with it for now) that includes a variety of algorithms, both deep and tabular. It raises the question of whether most RL use cases in Julia could—or should—be built on top of it. If functionality is missing, perhaps it would be more effective to extend POMDPs.jl rather than split efforts across multiple frameworks. I plan to port some of my own work to POMDPs.jl and share feedback afterward.
One of Julia’s great strengths—and also one of its potential pitfalls—is composability. When this breaks down, the benefits of the language are lost. Looking at packages like trl or verl, we see a common Python pattern of offering all-in-one solutions. The key question, then, is whether POMDPs.jl offers a sufficiently flexible interface to allow extensibility through external packages. Alternatively, if RL.jl has a distinct vision for a better interface, perhaps we can aim for compatibility with POMDPs.jl algorithms—so that we have unified implementations of, say, Q-learning or DQN—but with RL.jl providing a more composable and flexible run loop.
I am a distant observer. I am interested in RL, but mainly to investiagate the advantages / drawbacks with respect to model-based planning. My main problem with the current API of RL.jl is that it does not allow to easily have domain with infinite number of actions, which frequently happens in relational domains. Contrary, POMDP.jl allow it though the generative interface. So even though RL.jl specifies a lot of properties of the domain, it does not have what I need.
imagine that my state has form of a graph. Through some actions, new vertices can be added or deleted. On every vertex can have some actions, but not all of them could be possible. An example of this kind of domain is a simulation of a penetration tester, who is discovering the computer network (each node is a computer) with attacks possibly applied to computers.
In RL.jl, this means that I need to use ActionStyle to be FullActionSet, but then the legal_action_space_mask returns mask over actions returned by legal_action_space.
What I would actually need is actually something like legal_actions returning in every state set of feasible actions. This is (with different name) how it is done in generative interface of POMDP.jl.
Sorry I think I wasn’t clear. I get that this works in POMDP.jl, the question is given the good support for many domains in POMDP.jl, is there still a use case / niche for RL.jl apart from existing users (like myself)?
When I have found RL.jl, I was hoping it would be a go-to repository to find all popular RL methods, but due to the unfinished transition, it is not the case. I have to admit that I did not know about implementation of RL algorithms in POMDP.jl @greatpet listed Crux.jl and DeepQLearning.jl. POMDP.jl repositories are kind of scattered.
The scope of POMDPs.jl is “An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.” It’s about solving MDPs and POMDPs with known transitions, not reinforcement learning, where the transitions are unknown. I always envisioned ReinforcementLearning.jl as the go-to repository for reinforcement learning algorithms, just as @Tomas_Pevny mentioned. Of course, POMDPs.jl and ReinforcementLearning.jl share some features, such as interactions with the environment. This is why there was a lot of interaction between the developers of POMDs.jl and ReinforcementLearning.jl in the past, with e.g. the result of CommonRLInterface.jl. Therefore, I do not see really a split of efforts across multiple frameworks.
It would be nice to add this feature to CommonRLSpaces.jl (I guess). If this would be merged into CommonRLInterface.jl eventually, it would be available in a unified way for the POMDPs and the RL ecosystem.
To unite forces and move ReinforcementLearning.jl forward, I think it would be amazing to have a todo list with more details about the algorithms, documentations and tests that are missing and what needs to be done to refactor the algorithms, for example. Would you have the time to do this, @jeremiahpslewis, @HenriDeh?
In general, it seems to be a missed opportunity if RL.jl and POMDP.jl are each going to have Q-learning, deep q, etc. implementations, particularly as the POMDP.jl ones are only lightly tested and RL.jl is missing them at the moment. On the algorithm side, I’d love to see a common API and think it reduces duplicated efforts most effectively if the POMDP.jl algorithms implement this API and RL.jl picking up support for this API, rather than reinventing/reimplementing algorithms for RL.jl.
Fully agree. My understanding was that the RL implementations in POMDPs.jl are mostly there until ReinforcementLearning.jl stabilizes and is fully tested. I see CommonRLInterface.jl as the common API (it is already a dependency for both). If POMDPs.jl implements features that would be useful for ReinforcementLearning.jl, we could move them there, no?