Using AbstractEnv from CommonRLInterface with POMDPs

Hello. I am trying to use reinforcement learning to solve a simple problem as a proof of concept. I am using DeepQLearning and defining the AbstractEnv(an interface from CommonRLInterface). I am getting the following error and cannot figure out how to make it happy. I believe commonRLInterface and DeepQLearning should work togethor but maybe I misunderstood the documentation. Has anyone seen this error and know how to fix it?

LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{SVector{2, Integer}}

I figured out that it is referring to this struct:

struct DQExperience{N <: Real,T <: Real, A<:AbstractArray{T}}
    s::A
    a::N
    r::T
    sp::A
    done::Bool
end

Source code of my project:

# A simple grid world MDP

# All cells with reward are also terminal

using CommonRLInterface

using StaticArrays

using Compose

using DeepQLearning

using POMDPs

using Flux

using POMDPModels

using POMDPSimulators

using POMDPPolicies

import ColorSchemes

const RL = CommonRLInterface

mutable struct GridWorld <: AbstractEnv

    size::SVector{2, Integer}

    rewards::Dict{SVector{2, Integer}, Float64}

    state::SVector{2, Integer}

end

function GridWorld()

    rewards = Dict(SA[9,3]=> 10.0,

                   SA[8,8]=>  3.0,

                   SA[4,3]=>-10.0,

                   SA[4,6]=> -5.0)

    return GridWorld(SA[10, 10], rewards, SA[rand(1:10), rand(1:10)])

end

RL.reset!(env::GridWorld) = (env.state = SA[rand(1:env.size[1]), rand(1:env.size[2])])

RL.actions(env::GridWorld) = (SA[1,0], SA[-1,0], SA[0,1], SA[0,-1])

POMDPs.actions(env::GridWorld) = RL.actions(env::GridWorld)

RL.observe(env::GridWorld) = env.state

RL.terminated(env::GridWorld) = haskey(env.rewards, env.state)

function RL.act!(env::GridWorld, a)

    if rand() < 0.4 # 40% chance of going in a random direction (=30% chance of going in a wrong direction)

        a = rand(POMDPs.actions(env))

    end

    env.state = clamp.(env.state + a, SA[1,1], env.size)

    return get(env.rewards, env.state, 0.0)

end

# optional functions

@provide RL.observations(env::GridWorld) = [SA[x, y] for x in 1:env.size[1], y in 1:env.size[2]]

@provide RL.clone(env::GridWorld) = GridWorld(env.size, copy(env.rewards), env.state)

@provide RL.state(env::GridWorld) = env.state

@provide RL.setstate!(env::GridWorld, s) = (env.state = s)

@provide function RL.render(env::GridWorld)

    nx, ny = env.size

    cells = []

    for s in observations(env)

        r = get(env.rewards, s, 0.0)

        clr = get(ColorSchemes.redgreensplit, (r+10.0)/20.0)

        cell = context((s[1]-1)/nx, (ny-s[2])/ny, 1/nx, 1/ny)

        compose!(cell, rectangle(), fill(clr), stroke("gray"))

        push!(cells, cell)

    end

    grid = compose(context(), linewidth(0.5mm), cells...)

    outline = compose(context(), linewidth(1mm), rectangle(), stroke("gray"))

    s = env.state

    agent_ctx = context((s[1]-1)/nx, (ny-s[2])/ny, 1/nx, 1/ny)

    agent = compose(agent_ctx, circle(0.5, 0.5, 0.4), fill("orange"))

    sz = min(w,h)

    return compose(context((w-sz)/2, (h-sz)/2, sz, sz), agent, grid, outline)

end

# load MDP model from POMDPModels or define your own!

env = GridWorld();

# Define the Q network (see Flux.jl documentation)

# the gridworld state is represented by a 2 dimensional vector.

model = Chain(Dense(2, 32), Dense(32, length(POMDPs.actions(env))))

exploration = EpsGreedyPolicy(env, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))

solver = DeepQLearningSolver(qnetwork = model, max_steps=10000, 

                             exploration_policy = exploration,

                             learning_rate=0.005,log_freq=500,

                             recurrence=false,double_q=true, dueling=true, prioritized_replay=true)

policy = solve(solver, env)

sim = RolloutSimulator(max_steps=30)

r_tot = simulate(sim, env, policy)

println("Total discounted reward for 1 simulation: $r_tot")

As the error says, DeepQLearning.jl expects the state space to be an array of real numbers. So things like Vector{Float64} or Matrix{Float64} would work. But observations for you return a vector of StaticVectors, which is unfortunately not currently supported.

(This is because the type T <: Real and A <: AbstractArray{T} so A <: AbstractArray{Real} too.)

I thought that too initially, so I changed the observe function to:

RL.observe(env::GridWorld) = [env.state[1], env.state[2]]

which returns a Vector{Int64}, which also gives the error:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Int64}}

What am I missing? I thought Int64 is a subtype of real and Vector subtypes Array which subtypes AbstractArray. Does it need to be a Float64?
Or do I need to explictly create an AbstractArray and then set the values and return it?

It’s slightly more subtle. If the rewards are Float64 then your observe should return Vector{Float64} as well.

Also can you create an issue on the github? We should try to automatically convert these to float anyway because neural networks can only take that as input anyway.

Is this an issue for the CommonRLInterface.jl or POMDPs.jl? Or both?

Also, after casting to float with:

RL.observe(env::GridWorld) = [Float64(env.state[1]), Float64(env.state[2])]

I still get the error:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}

I’ve noticed Int64 and Float64 are of type Real, and Vector is of type AbstractArray, but Vector{Float64} is NOT of type AbstractArray{Real}
Try the following in a Julia REPL to see:
Float64 <: Real
Vector <: AbstractArray
Vector{Float64} <: AbstractArray{Real}

Due to this, there may be a problem in the type checking:
If we are trying to find if A{N} <: B{M}, then we should check that:
1 ) A <: B
2 ) N <: M
But currently it seems like it does
2 ) N == M instead

Consider this:

julia> Type{Vector{Float64}} <: AbstractArray{<:Real, N} where N
false

julia> Vector{Float64} <: AbstractArray{<:Real, N} where N
true

The issue does not have to do with Float64 <: Real or Vector{Float64} <: AbstractArray{Real} (which is not true). Rather, somewhere some code got passed a type when it should have been passed an instance/object.

1 Like

Do you mean this:
From DeepQLearning,
From Prioritized_replay_buffer.jl,

function PrioritizedReplayBuffer(env::AbstractEnv,
                                max_size::Int64,
                                batch_size::Int64;
                                rng::AbstractRNG = MersenneTwister(0),
                                α::Float32 = 6f-1,
                                β::Float32 = 4f-1,
                                ϵ::Float32 = 1f-3)
    o = observe(env)
    s_dim = size(o)
    experience = Vector{DQExperience{Int32, Float32, typeof(o)}}(undef, max_size) # This line!!
    priorities = Vector{Float32}(undef, max_size)
    _s_batch = zeros(Float32, s_dim..., batch_size)
    _a_batch = [CartesianIndex(0,0) for i=1:batch_size]
    _r_batch = zeros(Float32, batch_size)
    _sp_batch = zeros(Float32, s_dim..., batch_size)
    _done_batch = zeros(Float32, batch_size)
    _weights_batch = zeros(Float32, batch_size)
    return PrioritizedReplayBuffer(max_size, batch_size, rng, α, β, ϵ, 0, 1, priorities, experience,
                _s_batch, _a_batch, _r_batch, _sp_batch, _done_batch, _weights_batch)
end

Note this line:

experience = Vector{DQExperience{Int32, Float32, typeof(o)}}(undef, max_size) # This line!!

That line does not seem like the culprit - the parameters should be types. What is the stack trace of the TypeError: in DQExperience error? Look in that trace.

StackTrace:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}
Stacktrace:
 [1] PrioritizedReplayBuffer(env::GridWorld, max_size::Int64, batch_size::Int64; rng::Random.MersenneTwister, α::Float32, β::Float32, ϵ::Float32)
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\prioritized_experience_replay.jl:48
 [2] PrioritizedReplayBuffer
   @ C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\prioritized_experience_replay.jl:46 [inlined]
 [3] initialize_replay_buffer(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, SVector{2, Int64}}}}, env::GridWorld, action_indices::Dict{SVector{2, Int64}, Int64})
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\solver.jl:185
 [4] solve(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, SVector{2, Int64}}}}, env::GridWorld)
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\solver.jl:48
 [5] top-level scope
   @ c:\Users\Tyler\Documents\Julia\Learning\GridWorld.jl:89
in expression starting at c:\Users\Tyler\Documents\Julia\Learning\GridWorld.jl:89

Note the function I copied above comes from the stacktrace. There is not much else that seems suspicious.

Hmmm, I guess that must be the line that it is. I guess observe(env) is returning a type instead of an instance?

RL.observe(env::GridWorld) = [Float64(env.state[1]), Float64(env.state[2])]

Should be that Vector{Float64}

You are correct though, removing typeof() caused it to complain I gave it a value and not a type, which makes sense.

Yes, this is quite mysterious. Maybe print o and typeof(o) before line 48 to debug?

Results:
[2.0, 8.0]
Vector{Float64}

There is also a convert function defined, perhaps that is implictily being used (incorrectly)?
Although it looks correct to me, I am out of ideas.

function Base.convert(::Type{DQExperience{Int32, Float32, C}}, x::DQExperience{A, B, C}) where {A, B, C}
    return DQExperience{Int32, Float32, C}(convert(C, x.s),
                                            convert(Int32, x.a),
                                            convert(Float32, x.r),
                                            convert(C, x.sp),
                                            x.done)
end

Let me know if you find a solution for this, otherwise I am going to try to redefine my problem as a QuickPOMDP instead of an AbstractEnv

Just tried this in the REPL:

julia> Vector{DQExperience{Int32, Float32, Vector{Float64}}}(undef, 10)
ERROR: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

julia> Vector{DQExperience{Int32, Float32, Vector{Float32}}}(undef, 10)
10-element Vector{DQExperience{Int32, Float32, Vector{Float32}}}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef

Apparently, as the code stands, observe needs to return an abstract vector of Float32 because it needs to match the reward type.

In my opinion, this is a bug due to overly constrictive parameterization of the DQExperience type. Here is my proposed fix: https://github.com/JuliaPOMDP/DeepQLearning.jl/pull/64

I would also argue that the way they display the error is a confusing bug and should be fixed. @Tyler_Ingebrand do you want to take a shot at reporting this bug to Julia with my help, or shall I? (trying to teach as many people how to do this kind of thing as possible :slight_smile: )

Also @Tyler_Ingebrand can you try out the fix I suggested in the PR on your problem?