How can I change specific values in array?

Hello,

I am new to Julia and I am using POMDPs.jl package, specially TabularTDLearning.jl.

ValuePolicy has a property called value_table which you can access to get the q-values and can also be modified, for example for all zeros would be:

solver.Q_vals = zeros(length(states(mdp)), length(actions(mdp)))
solve(solver, mdp)

In this table, rows are the states and the collums are the actions.

In my case states and actions are structs:

struct State
    s1::Int 
    s2::SVector{2, Int} 
    s3::Int 
end

struct Action 
    a::SVector{2, Int} 
end

After, the whole state-space and the action-space is defined as follows:

states = SS, #State-space
actions = AS, #Action-space

I have create a function that given an state, returns all the possible actions:

function A(s::State)
    return  [(Action(a)) for a in Iterators.product(0:12.-s.s2[1],0:12-s.s2[2]) if sum(a)<=s.s1]
end

My objetive is to go collum by collum on the value_table and for every action that is not possible set a value on the table of for example -9999.

I have tried many loops but it doesn`t work. What is the best way to get it? Maybe a function?

Thanks in advance.

So there already seems to be some problems with you actions function where

function A(s::State)
    return  [(Action(a)) for a in Iterators.product(0:12.-s.s2[1],0:12-s.s2[2]) if sum(a)<=s.s1]
end

does not work, and might not do what you expect. First it complains about 0:12.-s.s2[1] since it is not clear if you want 0:12.0 - s.s2[1] or 0:12 .- s.s2[1]. Then the question is if you want to compute 0:(12 .- s.s2[1]) or (0:12) .- s.s2[1], where it is the first one that is actually computed.

For the actual question I’m not quite sure exactly what you want. My understanding is you want to loop over all states and actions, and if an action is not valid for a certain state you want to update some table with a large negative value? If that is it you could probably do it somewhat like this, though it might not be the most efficient

for s in states(mdp)
    valid_actions = A(s)
    for a in actions(mdp)
        if !(a in valid_actions)
            solver.Q_vals[state2idx(s), act2idx(a)] = -9999
        end
    end
end

and here I assume that you have some functions that transform state and action to an index in the table.

Hello,

This is fixed:

Yes, exactly.

When I try your code I get:

UndefVarError: state2idx not defined

I have look for this but it does not work. I got:


ERROR: MethodError: no method matching setindex!(::Nothing, ::Int64, ::Int64, ::Int64)

That was what i meant with assuming you had a function doing the conversion, you somehow have to select a mapping (or maybe that exists in pomps package) to say what state/action gets mapped to what index.

Yes, I found this. And did:

for s in states(mdp)
    valid_actions = A(s)
    for a in actions(mdp)
        if !(a in valid_actions)
            solver.Q_vals[POMDPs.stateindex(mdp, s) , POMDPs.actionindex(mdp, a) ] = -9999
        end
    end
end

But error was:

ERROR: MethodError: no method matching setindex!(::Nothing, ::Int64, ::Int64, ::Int64)

That seems to say that solver.Q_vals is nothing. Are you sure that this is the field you are supposed to use? On the phone now so too lazy to try to look it up :sweat_smile:

I am not really sure, but I found this that might be the reason. Q_vals::Union{Nothing, Matrix{Float64}} = nothing

It does look like that would be the case.

Maybe just try to initialize it like you showed in the first post?

Solution:

solver.Q_vals = zeros(length(states(mdp)), length(actions(mdp)))

Before your code.

Thank you so much.