Thanks for the detailed explanation. I did try out your suggestions like adding relu to the layers. However what I am noticing is having no tanh in the final layer does update the network in my case.
What I unable to understand is the code snippet in your answer. If the actor’s network design you are taking remains the same why is it able to update in your case. Did you change anything else?
Also one more thing i notice is that upon running update function more than once the param values seem to remain the same. I guess it might be possible that the network does not change much after the first train!.