Large Tikhonov inverse of Matrix -> CUDA?


I want to explicitly obtain a Tikhonov inverse of an ill-conditioned matrix, for example:

julia> using LinearAlgebra

julia> BLAS.set_num_threads(12)

julia> function Tikhonov(M, λ=1e-15)
               Tm = inv(M' * M + λ * I(size(M, 2)))
               return M
Tikhonov (generic function with 2 methods)


julia> x = randn((12000, 6000));

julia> @time Tikhonov(x);
  5.432725 seconds (13 allocations: 827.002 MiB, 1.67% gc time)

However, I want to speed up this operation but CUDA.jl does not seem to have a inv function for that matrix size. I think the CPU has already reached its limit.