I can’t comment on your first paragraph, but the truth is most DL applications just don’t need the level of precision afforded by 64-bit floats. Normalization and other forms of regularization generally encourage smaller magnitude weights that can more effectively use the limited precision of float32 or even float16. Models that are sensitive to small weight perturbations are also more likely to be susceptible to adversarial attacks, while training with noisy data can improve network generalization.
With regards to speeding up training, mixed-precision has been gaining traction of late (see e.g. torch.cuda.amp).