It might be a problem with the initialization, i.e., the σ activation function easily saturates if its inputs are too large. Did a quick check using relu which seemed to train fine also with 300 hidden neurons.
In any case, I good test is to plot the input-output function of the untrained network. In the example, with σ activations, I got an almost constant function with randomly initialized weights.
2 Likes