Suggestions for improved learning in neural network based controller for rotary inverted pendulum

Thanks for the suggestion. I will penalise the control effort. The main problem is it taking too much time for learning to reduce the loss (means the pendulum to get balanced).