Hello. I was wondering whether someone could explain to me why the working space for an SVD (gesvdj to be precise) differs wildly between a call to cusolverDnDgesvdj_bufferSize via the Julia interface and a call via CUDA directly.
When trying to compute an economical SVD via Julia’s CUDA interface, I have difficulty to understand how a SVD of a 1 000 000 x 2 matrix (of Float64) could require almost 1 GB in workspace. The call to bufferSize() in dense.jl results in a request of
1040136960 bytes
(if my println in dense.jl is to be trusted). Outside Julia, when calling the same function, with the same input (and i32 ints as arguments), via the CUDA interface directly, the requested workspace amounts to
16000256 bytes.
In my user code, I call Julia’s svd! function, which turns out to call CUDA’s gesvdj, just to say that I did not force the call to any particular CUDA function nor via the by now apparently deprecated CUDA interface.
I’m on Julia v1.11.3, with CUDA installed via pkg (CUDA runtime 12.6, artifact installation, CUDA driver 12.6, CUSOLVER: 11.7.1, Julia CUDA package 5.6.1, CUDA_Driver_jll: 0.10.4+0, CUDA_Runtime_jll: 0.15.5+0)
Any insights would be greatly appreciated, as the target size of the matrices is more in the range of 10^7, 10^8 rows and between 100 and 1000 columns.
Best regards, Frank