So there already seems to be some problems with you actions function where
return [(Action(a)) for a in Iterators.product(0:12.-s.s2,0:12-s.s2) if sum(a)<=s.s1]
does not work, and might not do what you expect. First it complains about 0:12.-s.s2 since it is not clear if you want 0:12.0 - s.s2 or 0:12 .- s.s2. Then the question is if you want to compute 0:(12 .- s.s2) or (0:12) .- s.s2, where it is the first one that is actually computed.
For the actual question I’m not quite sure exactly what you want. My understanding is you want to loop over all states and actions, and if an action is not valid for a certain state you want to update some table with a large negative value? If that is it you could probably do it somewhat like this, though it might not be the most efficient
for s in states(mdp)
valid_actions = A(s)
for a in actions(mdp)
if !(a in valid_actions)
solver.Q_vals[state2idx(s), act2idx(a)] = -9999
and here I assume that you have some functions that transform state and action to an index in the table.
That was what i meant with assuming you had a function doing the conversion, you somehow have to select a mapping (or maybe that exists in pomps package) to say what state/action gets mapped to what index.