Using AbstractEnv from CommonRLInterface with POMDPs

Tyler_Ingebrand · September 24, 2021, 10:10pm

Hello. I am trying to use reinforcement learning to solve a simple problem as a proof of concept. I am using DeepQLearning and defining the AbstractEnv(an interface from CommonRLInterface). I am getting the following error and cannot figure out how to make it happy. I believe commonRLInterface and DeepQLearning should work togethor but maybe I misunderstood the documentation. Has anyone seen this error and know how to fix it?

LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{SVector{2, Integer}}

I figured out that it is referring to this struct:

struct DQExperience{N <: Real,T <: Real, A<:AbstractArray{T}}
    s::A
    a::N
    r::T
    sp::A
    done::Bool
end

Source code of my project:

# A simple grid world MDP

# All cells with reward are also terminal

using CommonRLInterface

using StaticArrays

using Compose

using DeepQLearning

using POMDPs

using Flux

using POMDPModels

using POMDPSimulators

using POMDPPolicies

import ColorSchemes

const RL = CommonRLInterface

mutable struct GridWorld <: AbstractEnv

    size::SVector{2, Integer}

    rewards::Dict{SVector{2, Integer}, Float64}

    state::SVector{2, Integer}

end

function GridWorld()

    rewards = Dict(SA[9,3]=> 10.0,

                   SA[8,8]=>  3.0,

                   SA[4,3]=>-10.0,

                   SA[4,6]=> -5.0)

    return GridWorld(SA[10, 10], rewards, SA[rand(1:10), rand(1:10)])

end

RL.reset!(env::GridWorld) = (env.state = SA[rand(1:env.size[1]), rand(1:env.size[2])])

RL.actions(env::GridWorld) = (SA[1,0], SA[-1,0], SA[0,1], SA[0,-1])

POMDPs.actions(env::GridWorld) = RL.actions(env::GridWorld)

RL.observe(env::GridWorld) = env.state

RL.terminated(env::GridWorld) = haskey(env.rewards, env.state)

function RL.act!(env::GridWorld, a)

    if rand() < 0.4 # 40% chance of going in a random direction (=30% chance of going in a wrong direction)

        a = rand(POMDPs.actions(env))

    end

    env.state = clamp.(env.state + a, SA[1,1], env.size)

    return get(env.rewards, env.state, 0.0)

end

# optional functions

@provide RL.observations(env::GridWorld) = [SA[x, y] for x in 1:env.size[1], y in 1:env.size[2]]

@provide RL.clone(env::GridWorld) = GridWorld(env.size, copy(env.rewards), env.state)

@provide RL.state(env::GridWorld) = env.state

@provide RL.setstate!(env::GridWorld, s) = (env.state = s)

@provide function RL.render(env::GridWorld)

    nx, ny = env.size

    cells = []

    for s in observations(env)

        r = get(env.rewards, s, 0.0)

        clr = get(ColorSchemes.redgreensplit, (r+10.0)/20.0)

        cell = context((s[1]-1)/nx, (ny-s[2])/ny, 1/nx, 1/ny)

        compose!(cell, rectangle(), fill(clr), stroke("gray"))

        push!(cells, cell)

    end

    grid = compose(context(), linewidth(0.5mm), cells...)

    outline = compose(context(), linewidth(1mm), rectangle(), stroke("gray"))

    s = env.state

    agent_ctx = context((s[1]-1)/nx, (ny-s[2])/ny, 1/nx, 1/ny)

    agent = compose(agent_ctx, circle(0.5, 0.5, 0.4), fill("orange"))

    sz = min(w,h)

    return compose(context((w-sz)/2, (h-sz)/2, sz, sz), agent, grid, outline)

end

# load MDP model from POMDPModels or define your own!

env = GridWorld();

# Define the Q network (see Flux.jl documentation)

# the gridworld state is represented by a 2 dimensional vector.

model = Chain(Dense(2, 32), Dense(32, length(POMDPs.actions(env))))

exploration = EpsGreedyPolicy(env, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))

solver = DeepQLearningSolver(qnetwork = model, max_steps=10000, 

                             exploration_policy = exploration,

                             learning_rate=0.005,log_freq=500,

                             recurrence=false,double_q=true, dueling=true, prioritized_replay=true)

policy = solve(solver, env)

sim = RolloutSimulator(max_steps=30)

r_tot = simulate(sim, env, policy)

println("Total discounted reward for 1 simulation: $r_tot")

rejuvyesh · September 25, 2021, 12:23am

As the error says, DeepQLearning.jl expects the state space to be an array of real numbers. So things like Vector{Float64} or Matrix{Float64} would work. But observations for you return a vector of StaticVectors, which is unfortunately not currently supported.

(This is because the type T <: Real and A <: AbstractArray{T} so A <: AbstractArray{Real} too.)

Tyler_Ingebrand · September 27, 2021, 5:21pm

I thought that too initially, so I changed the observe function to:

RL.observe(env::GridWorld) = [env.state[1], env.state[2]]

which returns a Vector{Int64}, which also gives the error:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Int64}}

What am I missing? I thought Int64 is a subtype of real and Vector subtypes Array which subtypes AbstractArray. Does it need to be a Float64?
Or do I need to explictly create an AbstractArray and then set the values and return it?

rejuvyesh · September 27, 2021, 5:40pm

It’s slightly more subtle. If the rewards are Float64 then your observe should return Vector{Float64} as well.

rejuvyesh · September 27, 2021, 5:42pm

Also can you create an issue on the github? We should try to automatically convert these to float anyway because neural networks can only take that as input anyway.

Tyler_Ingebrand · September 27, 2021, 5:43pm

Is this an issue for the CommonRLInterface.jl or POMDPs.jl? Or both?

Tyler_Ingebrand · September 27, 2021, 5:50pm

Also, after casting to float with:

RL.observe(env::GridWorld) = [Float64(env.state[1]), Float64(env.state[2])]

I still get the error:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}

I’ve noticed Int64 and Float64 are of type Real, and Vector is of type AbstractArray, but Vector{Float64} is NOT of type AbstractArray{Real}
Try the following in a Julia REPL to see:
Float64 <: Real
Vector <: AbstractArray
Vector{Float64} <: AbstractArray{Real}

Due to this, there may be a problem in the type checking:
If we are trying to find if A{N} <: B{M}, then we should check that:
1 ) A <: B
2 ) N <: M
But currently it seems like it does
2 ) N == M instead

zsunberg · September 27, 2021, 6:29pm

Consider this:

julia> Type{Vector{Float64}} <: AbstractArray{<:Real, N} where N
false

julia> Vector{Float64} <: AbstractArray{<:Real, N} where N
true

The issue does not have to do with Float64 <: Real or Vector{Float64} <: AbstractArray{Real} (which is not true). Rather, somewhere some code got passed a type when it should have been passed an instance/object.

Tyler_Ingebrand · September 27, 2021, 6:35pm

Do you mean this:
From DeepQLearning,
From Prioritized_replay_buffer.jl,

function PrioritizedReplayBuffer(env::AbstractEnv,
                                max_size::Int64,
                                batch_size::Int64;
                                rng::AbstractRNG = MersenneTwister(0),
                                α::Float32 = 6f-1,
                                β::Float32 = 4f-1,
                                ϵ::Float32 = 1f-3)
    o = observe(env)
    s_dim = size(o)
    experience = Vector{DQExperience{Int32, Float32, typeof(o)}}(undef, max_size) # This line!!
    priorities = Vector{Float32}(undef, max_size)
    _s_batch = zeros(Float32, s_dim..., batch_size)
    _a_batch = [CartesianIndex(0,0) for i=1:batch_size]
    _r_batch = zeros(Float32, batch_size)
    _sp_batch = zeros(Float32, s_dim..., batch_size)
    _done_batch = zeros(Float32, batch_size)
    _weights_batch = zeros(Float32, batch_size)
    return PrioritizedReplayBuffer(max_size, batch_size, rng, α, β, ϵ, 0, 1, priorities, experience,
                _s_batch, _a_batch, _r_batch, _sp_batch, _done_batch, _weights_batch)
end

Note this line:

experience = Vector{DQExperience{Int32, Float32, typeof(o)}}(undef, max_size) # This line!!

zsunberg · September 27, 2021, 6:40pm

That line does not seem like the culprit - the parameters should be types. What is the stack trace of the TypeError: in DQExperience error? Look in that trace.

Tyler_Ingebrand · September 27, 2021, 6:47pm

StackTrace:

ERROR: LoadError: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}
Stacktrace:
 [1] PrioritizedReplayBuffer(env::GridWorld, max_size::Int64, batch_size::Int64; rng::Random.MersenneTwister, α::Float32, β::Float32, ϵ::Float32)
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\prioritized_experience_replay.jl:48
 [2] PrioritizedReplayBuffer
   @ C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\prioritized_experience_replay.jl:46 [inlined]
 [3] initialize_replay_buffer(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, SVector{2, Int64}}}}, env::GridWorld, action_indices::Dict{SVector{2, Int64}, Int64})
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\solver.jl:185
 [4] solve(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, SVector{2, Int64}}}}, env::GridWorld)
   @ DeepQLearning C:\Users\Tyler\.julia\packages\DeepQLearning\Uet74\src\solver.jl:48
 [5] top-level scope
   @ c:\Users\Tyler\Documents\Julia\Learning\GridWorld.jl:89
in expression starting at c:\Users\Tyler\Documents\Julia\Learning\GridWorld.jl:89

Note the function I copied above comes from the stacktrace. There is not much else that seems suspicious.

zsunberg · September 27, 2021, 6:54pm

Hmmm, I guess that must be the line that it is. I guess observe(env) is returning a type instead of an instance?

Tyler_Ingebrand · September 27, 2021, 6:55pm

RL.observe(env::GridWorld) = [Float64(env.state[1]), Float64(env.state[2])]

Should be that Vector{Float64}

Tyler_Ingebrand · September 27, 2021, 6:59pm

You are correct though, removing typeof() caused it to complain I gave it a value and not a type, which makes sense.

zsunberg · September 27, 2021, 7:02pm

Yes, this is quite mysterious. Maybe print o and typeof(o) before line 48 to debug?

Tyler_Ingebrand · September 27, 2021, 7:09pm

Results:
[2.0, 8.0]
Vector{Float64}

Tyler_Ingebrand · September 27, 2021, 7:12pm

There is also a convert function defined, perhaps that is implictily being used (incorrectly)?
Although it looks correct to me, I am out of ideas.

function Base.convert(::Type{DQExperience{Int32, Float32, C}}, x::DQExperience{A, B, C}) where {A, B, C}
    return DQExperience{Int32, Float32, C}(convert(C, x.s),
                                            convert(Int32, x.a),
                                            convert(Float32, x.r),
                                            convert(C, x.sp),
                                            x.done)
end

Tyler_Ingebrand · September 27, 2021, 8:28pm

Let me know if you find a solution for this, otherwise I am going to try to redefine my problem as a QuickPOMDP instead of an AbstractEnv

zsunberg · September 27, 2021, 8:47pm

Just tried this in the REPL:

julia> Vector{DQExperience{Int32, Float32, Vector{Float64}}}(undef, 10)
ERROR: TypeError: in DQExperience, in A, expected A<:(AbstractArray{T<:Real, N} where N), got Type{Vector{Float64}}
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

julia> Vector{DQExperience{Int32, Float32, Vector{Float32}}}(undef, 10)
10-element Vector{DQExperience{Int32, Float32, Vector{Float32}}}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef

Apparently, as the code stands, observe needs to return an abstract vector of Float32 because it needs to match the reward type.

In my opinion, this is a bug due to overly constrictive parameterization of the DQExperience type. Here is my proposed fix: https://github.com/JuliaPOMDP/DeepQLearning.jl/pull/64

I would also argue that the way they display the error is a confusing bug and should be fixed. @Tyler_Ingebrand do you want to take a shot at reporting this bug to Julia with my help, or shall I? (trying to teach as many people how to do this kind of thing as possible )

zsunberg · September 27, 2021, 8:50pm

Also @Tyler_Ingebrand can you try out the fix I suggested in the PR on your problem?

Topic		Replies	Views
What is the status of ReinforcementLearning.jl Machine Learning reinforcement-learni	24	898	April 8, 2025
Issue with ReinforcementLearning.jl BasicDQN with custom environment Machine Learning question , error	3	732	June 22, 2021
Error with CircularBufferArrays in ReinforcementLearning.jl Machine Learning question , package , error	5	738	July 11, 2021
Bounds Error when following tutorial of ReinforcementLearning.jl Machine Learning question	0	81	May 8, 2024
LoadError: Can't differentiate loopinfo expression Machine Learning differentiation , flux	8	691	October 4, 2021

Using AbstractEnv from CommonRLInterface with POMDPs

Related topics