ANN: QuickPOMDPs and POMDPs v0.8

Hi All,

I’d like to announce some enhancements to the POMDPs ecosystem. First, we have introduced the QuickPOMDPs.jl package, which makes it much easier to define Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) without sacrificing much flexibility. Instead of the object-oriented approach of the main POMDPs.jl interface, it uses a composition-oriented approach, allowing each element to be flexibly specified as a function or object.

The classic mountain car problem with visualization thanks to Compose.jl looks like this:

mountaincar = QuickMDP(
    function (s, a, rng)        
        x, v = s
        vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
        xp = x + vp
        if xp > 0.5
            r = 100.0
        else
            r = -1.0
        end
        return (sp=(xp, vp), r=r)
    end,
    actions = [-1., 0., 1.],
    initialstate = (-0.5, 0.0),
    discount = 0.95,
    isterminal = s -> s[1] > 0.5,

    render = function (step)
        cx = step.s[1]
        cy = 0.45*sin(3*cx)+0.5
        car = (context(), circle(cx, cy+0.035, 0.035), fill("blue"))
        track = (context(), line([(x, 0.45*sin(3*x)+0.5) for x in -1.2:0.01:0.6]), stroke("black"))
        goal = (context(), star(0.5, 1.0, -0.035, 5), fill("gold"), stroke("black"))
        bg = (context(), rectangle(), fill("white"))
        ctx = context(0.7, 0.05, 0.6, 0.9, mirror=Mirror(0, 0, 0.5))
        return compose(context(), (ctx, car, track, goal), bg)
    end
)

whereas the python implementation looks like this and is much slower. Moreover, the POMDPSimulators and POMDPGifs packages can be used to generate animations in the REPL, Juno, or Jupyter notebooks that look like this:

mountaincarXUonvQ6i

This represents a significant improvement over other frameworks like open ai gym, especially in a classroom setting for a Reinforcement Learning, Decision Making under Uncertainty or AI course. Specifically, it allows easy specification of problems with much more flexibility (i.e. specifying transition probabilities rather than just a simulation model) quickly, easily, and clearly, without any sacrifice in speed (compared to compiled languages, let alone interpreted ones). All QuickPOMDP models are compatible with the solvers implemented for POMDPs.jl, and there is also a bridge to ReinforcementLearning.jl that will hopefully be expanded in the future, and RLInterface.jl

We have also made major flexibility enhancements to the main POMDPs.jl v0.8 interface (thanks to @MaximeBouton, @lassepe, @rejuvyesh, @shushman), adding the ability to specify a custom dynamic decision network structure. This paves the way for implementation of new problem classes like constrained POMDPs and new solution methods that involve factorization.

All of this has been made possible only by the hard work of the Julia developers and community. We are very grateful for the efforts of everyone. In my opinion, Julia is definitely the best tool for research and teaching in this area, and it has been extremely helpful in my understanding!

16 Likes