In-place svd?

I need to do SVD many many times, so I wanna have a in-place version of svd().

What I mean by “in-place” is like:

svd!(U, S, V, A)

which works out the svd of A and writes the output matrices into the pre-allocated U, S and V.

Note the LinearAlgebra.svd!, LinearAlgebra.LAPACK.ggsvd! and LinearAlgebra.LAPACK.gesvd! are all reusing the memory of input for immediate calculations, but all outputs are allocated.

Thanks.

In general, this won’t help that much for big matrices since svd is O(n^3), and for small matrices, you probably want to be using StaticArrays. That said, it wouldn’t hurt to add this.

Google a bit found that recently a paper claims O(n^2) for a deterministic SVD algorithm.

My point is, the !() functions in LinearAlgebra (e.g. eigen!()) are not “in-place” in conventional sense. And “real in-place” functions are very needed!

I believe that the O(n^2) is only for very specific structures of matrices (eg banded with fixed band size). Real in-place methods wouldn’t be bad, but won’t lead to major speedups.

1 Like

docs

am I missing something? there seems to be a svd!(A) method that is just what you want?

1 Like

By studying the Julia implementation of LAPACK.gesvd! or LAPACK.gesdd! and the LAPACK manual, you can easily write such a function svd!(U, S, V, A) that does exactly what you want. You’ll probably even want to provide an additional argument for a work buffer that you can also recycle in between iterations.

Note that there is no option in LAPACK not to destroy the contents of A. Depending on the jobu and jobv parameters, you can recycle the memory of A to store the final U or V, so that you do not need to allocate one of those, but even if you have allocated space for those, you cannot keep the contents of A. So you would need to take a copy of A if you still need to use it afterwards.

2 Likes

I guess speed up would be significant, especially if lots of garbage collections are needed for running non in-place versions many many times.

LinearAlgebra.svd!() allocates memory for the output, i.e., I could NOT pre-allocate the memory for output. It just reuse the memory of input for immediate calculations.

I’m confused to what you want exactly

1 Like

The reason this isn’t significant is that svd is slow. Specifically, for a 100x100 matrix, svd takes 1.5ms, while allocating arrays for the output is about 7us, and a gc run is in the 10ms range, but won’t be called frequently. As such, for a loop of running svd on a 100x100 matrix, I’m seeing .3% gc time on average. For smaller matrices, it can matter more, but I’m only seeing 1% gc time for 20x20 matrices, and if you get much smaller, you should be using StaticArrays anyway.

4 Likes

what about the complexity of LinearAlgebra.eigen() then? is it worth to develop in-place version of it?
Thanks.

eigen is slower than svd, unless its a hermitian matrix. Probably @Oscar_Smith is correct for factorisations. For multiplying matrices, even though the multiplication is also O(N^3), while the required memory for the result is O(N^2), I do think it can make a difference, just because this O(N^3) has been so optimised (and has a much smaller prefactor than in the svd or eigen case).

It has been my experience that the Julia’s GC, despite certainly having been much improved, still has some overhead when a lot of big temporary arrays are involved, in comparison to Matlab or Python. But I would need to do more thorough benchmarking and find a good test case; it’s hard to carefully compare this in a way that excludes all other possible causes. It is however a fact that in TensorOperations.jl I added a package wide cache structure specifically for storing intermediate arrays in tensor contractions, and (provided they all fit in memory), this can improve run time with a significant fraction.

2 Likes