I want to implement a loss function for the weight-update of the neural network in my Reinforcement learning (DQN) project. The Q Value of the next state should be estimated by my target network (yellow mark) wheras the value estimate of the current action shall be estimated by my main network, which I want to update. My Problem is, that I don’t know how I can check the estimate the value of a specific action for a given state (the red mark).
The loss function I want to implement:
my implementation (which doesn’t work)
do you know how to implement the current estimate for a given action? (= the output of the neural network at index a)
Thank you for your help!