Help I am new: Is this MDP righ?

Hi, first a comment on formatting code: you could enclose your complete MWE in triple back ticks and it would show as code in Discourse.

That said your own definition

struct MyMDP <: MDP{Int,Int} end

does indeed not contain a member indices which these functions

POMDPs.stateindex(m::MyMDP, s) = m.indices[s]
POMDPs.actionindex(m::MyMDP, a) = m.indices[a]

try to access. How did you get the idea this could work?

Hello,

First of all thank you so much for your quick response and sorry for the format.

I tried to follow the documentation. How can I define the member indices?

Thanks!

Thanks. Your example looks a bit like this one. I tried

struct MyMDP <: MDP{Int,Int} 
    indices::Dict{String, Int}
    MyMDP() = new(Dict{Int, Int}())
end

but it looks to me like you have to fill the index dictionary with some state action pairs like

Dict{Int, Int}(0=>? ...)

?

Hi @goerch,

I’ve tried what you said:

using POMDPs
using POMDPModelTools: Deterministic, Uniform, SparseCat
using TabularTDLearning
using POMDPPolicies
using POMDPModels
struct MyMDP <: MDP{Int,Int}
indices::Dict{Int, Int}
MyMDP() = new(Dict(0=>1,1=>2,2=>2,3=>4))
end
mdp = MyMDP()
POMDPs.actions(m::MyMDP) = [0,1,2,3]
POMDPs.states(m::MyMDP) = [0,1,2,3]
POMDPs.discount(m::MyMDP) = 0.95
POMDPs.stateindex(m::MyMDP, s) = m.indices[s]
POMDPs.actionindex(m::MyMDP, a) = m.indices[a]
POMDPs.initialstate(m::MyMDP) = Uniform([0,1,2,3])
function POMDPs.transition(m::MyMDP, s, a)
if s == 0 && a == 0
return SparseCat([0,1,2,3], [1,0,0,0])
elseif s == 0 && a == 1
return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
elseif s == 0 && a == 2
return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
elseif s == 0 && a == 3
return SparseCat([0,1,2,3], [0,0.2,0.5,0.3])
elseif s == 1 && a == 0
return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
elseif s == 1 && a == 1
return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
elseif s == 1 && a == 2
return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
elseif s == 2 && a == 0
return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
elseif s == 2 && a == 1
return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
elseif s == 3 && a == 0
return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
end
end
function POMDPs.reward(m::MyMDP, s, a)
if s == 0 && a == 0
return -45
elseif s == 0 && a == 1
return -40
elseif s == 0 && a == 2
return -50
elseif s == 0 && a == 3
return -70
elseif s == 1 && a == 0
return -14
elseif s == 1 && a == 1
return -44
elseif s == 1 && a == 2
return -54
elseif s == 1 && a == 3
return -74
elseif s == 2 && a == 0
return -8
elseif s == 2 && a == 1
return -38
elseif s == 3 && a == 0
return -12
end
end
q_learning_solver = QLearningSolver(n_episodes=10,
learning_rate=0.8,
exploration_policy=EpsGreedyPolicy(mdp, 0.5),
verbose=false);
q_learning_policy = solve(q_learning_solver, mdp);

But I get “ERROR: ArgumentError: Sampler for this object is not defined” for

q_learning_policy = solve(q_learning_solver, mdp);

I have also tried

exppolicy = EpsGreedyPolicy(mdp, 0.01)
solver = QLearningSolver(exppolicy, learning_rate=0.1, n_episodes=50, max_episode_length=50, eval_every=50, n_eval_traj=100)
policy = solve(solver, mdp)

That I found here but in

solver = QLearningSolver(exppolicy, learning_rate=0.1, n_episodes=50, max_episode_length=50, eval_every=50, n_eval_traj=100)

It says: ERROR: MethodError: no method matching QLearningSolver(::EpsGreedyPolicy{POMDPPolicies.var"#17#18"{Float64}, Random._GLOBAL_RNG, Vector{Int64}}; learning_rate=0.1, n_episodes=50, max_episode_length=50, eval_every=50, n_eval_traj=100)

I do not undertand what happens. Thank you for your help.

I instrumented transition and reward

function POMDPs.transition(m::MyMDP, s, a)
    if s == 0 && a == 0
        return SparseCat([0,1,2,3], [1,0,0,0])
[...]
    else
        @assert false
    end    
end

function POMDPs.reward(m::MyMDP, s, a)
    if s == 0 && a == 0
        return -45
[...]
    else
        @assert false
    end        
end    

and get

ERROR: AssertionError: false

in transition. So it seems this function is incomplete?

Hello,

I have also tried:

function POMDPs.transition(m::ClinicalTrialsMDP, s, a)
    if s == 0 && a == 0 
        return SparseCat([0,1,2,3], [1,0,0,0])
    elseif s == 0 && a == 1 
        return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
    elseif s == 0 && a == 2 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 0 && a == 3 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])

    elseif s == 1 && a == 0 
        return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
    elseif s == 1 && a == 1 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 1 && a == 2 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 1 && a == 3 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
        
    elseif s == 2 && a == 0 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 2 && a == 1 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 2 && a == 2 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 2 && a == 3 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])

    elseif s == 3 && a == 0 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 1 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 2 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 3 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    end
end 


function POMDPs.reward(m::ClinicalTrialsMDP, s, a)
    if s == 0 && a == 0 
        return -45
    elseif s == 0 && a == 1 
        return -40
    elseif s == 0 && a == 2 
        return -50
    elseif s == 0 && a == 3 
        return -70
    
    elseif s == 1 && a == 0 
        return -14
    elseif s == 1 && a == 1 
        return -44
    elseif s == 1 && a == 2 
        return -54
    elseif s == 1 && a == 3 
        return -74   

    elseif s == 2 && a == 0 
        return -8
    elseif s == 2 && a == 1 
        return -38
    
    elseif s == 3 && a == 0 
        return -12
    end
end 

But still not working.

The following variation

using POMDPs
using POMDPModelTools
using QuickPOMDPs: QuickPOMDP
using TabularTDLearning
using POMDPPolicies
using POMDPModels
using Parameters, Random

struct MyMDP <: MDP{Int,Int} 
    indices::Dict{Int, Int}
    MyMDP() = new(Dict{Int, Int}(0=>1,1=>2,2=>3,3=>4))
end

mdp = MyMDP()

POMDPs.actions(m::MyMDP) = [0,1,2,3]
POMDPs.states(m::MyMDP) = [0,1,2,3]
POMDPs.discount(m::MyMDP) = 0.95
POMDPs.stateindex(m::MyMDP, s) = m.indices[s]
POMDPs.actionindex(m::MyMDP, a) = m.indices[a]
POMDPs.initialstate(m::MyMDP) = Uniform([0,1,2,3])

function POMDPs.transition(m::MyMDP, s, a)
    if s == 0 && a == 0 
        return SparseCat([0,1,2,3], [1,0,0,0])
    elseif s == 0 && a == 1 
        return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
    elseif s == 0 && a == 2 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 0 && a == 3 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])

    elseif s == 1 && a == 0 
        return SparseCat([0,1,2,3], [0.7, 0.3,0,0])
    elseif s == 1 && a == 1 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 1 && a == 2 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 1 && a == 3 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
        
    elseif s == 2 && a == 0 
        return SparseCat([0,1,2,3], [0.2, 0.5,0.3,0])
    elseif s == 2 && a == 1 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 2 && a == 2 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 2 && a == 3 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])

    elseif s == 3 && a == 0 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 1 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 2 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    elseif s == 3 && a == 3 
        return SparseCat([0,1,2,3], [0, 0.2,0.5,0.3])
    else
        @show "transition", s, a
        return Uniform([0, 1, 2, 3])
    end    
end

function POMDPs.reward(m::MyMDP, s, a)
    if s == 0 && a == 0
        return -45
    elseif s == 0 && a == 1
        return -40
    elseif s == 0 && a == 2
        return -50
    elseif s == 0 && a == 3
        return -70
    elseif s == 1 && a == 0
        return -14
    elseif s == 1 && a == 1
        return -44
    elseif s == 1 && a == 2
        return -54
    elseif s == 1 && a == 3
        return -74  
    elseif s == 2 && a == 0
        return -8
    elseif s == 2 && a == 1
        return -38
    elseif s == 3 && a == 0
        return -12
    else
        @show "reward", s, a
        return 0
    end        
end    

q_learning_solver = QLearningSolver(n_episodes=10,
                                learning_rate=0.8,
                                exploration_policy=EpsGreedyPolicy(mdp, 0.5),
                                verbose=false);
q_learning_policy = solve(q_learning_solver, mdp);    

does something for me without showing errors. I don’t know if it is what you intended it to do.

1 Like

Hello,

Thank you so much for your help! I t works but I need to interpret first the results. Can you explain me why you use in transition:

else
        @show "transition", s, a
        return Uniform([0, 1, 2, 3])
    end    

and in reward

else
        @show "reward", s, a
        return 0
    end   

I only tried to make the functions well defined, i.e. returning some data in every case (which seems required). It at least showed that some cases in reward where not defined…

1 Like

Hello,

I understand, but doing like you did means that every transition is possible. What if I am in state 2 (s==2) and I can only take action 0 and 1 (a==0, a==1)?

If I had understand your code, when you do :

 @show "transition", s, a
        return Uniform([0, 1, 2, 3])
    end    
end

Every transition is possible, right?