The smoothness of the entropy term scales as 1/\sigma. So to get a smooth loss function you actually need a strictly positive lower bound. The problem is how small must that positive constant be.
The smoothness of the entropy term scales as 1/\sigma. So to get a smooth loss function you actually need a strictly positive lower bound. The problem is how small must that positive constant be.