Incorporating SVD factorization into GPUs with CUDA

I tried using the opnorm function, which would be the most efficient scenario. Sadly, Zygote doesn’t have chain rule rules for it. This link suggested using SVD instead.

Power iterations may be an alternative, but I’d be concerned in convergence speed.