In-place svd?

I need to do SVD many many times, so I wanna have a in-place version of svd().

What I mean by “in-place” is like:

svd!(U, S, V, A)

which works out the svd of A and writes the output matrices into the pre-allocated U, S and V.

Note the LinearAlgebra.svd!, LinearAlgebra.LAPACK.ggsvd! and LinearAlgebra.LAPACK.gesvd! are all reusing the memory of input for immediate calculations, but all outputs are allocated.

Thanks.

In general, this won’t help that much for big matrices since svd is O(n^3), and for small matrices, you probably want to be using StaticArrays. That said, it wouldn’t hurt to add this.

Google a bit found that recently a paper claims O(n^2) for a deterministic SVD algorithm.

My point is, the !() functions in LinearAlgebra (e.g. eigen!()) are not “in-place” in conventional sense. And “real in-place” functions are very needed!

I believe that the O(n^2) is only for very specific structures of matrices (eg banded with fixed band size). Real in-place methods wouldn’t be bad, but won’t lead to major speedups.

1 Like

docs

am I missing something? there seems to be a svd!(A) method that is just what you want?

1 Like

By studying the Julia implementation of LAPACK.gesvd! or LAPACK.gesdd! and the LAPACK manual, you can easily write such a function svd!(U, S, V, A) that does exactly what you want. You’ll probably even want to provide an additional argument for a work buffer that you can also recycle in between iterations.

Note that there is no option in LAPACK not to destroy the contents of A. Depending on the jobu and jobv parameters, you can recycle the memory of A to store the final U or V, so that you do not need to allocate one of those, but even if you have allocated space for those, you cannot keep the contents of A. So you would need to take a copy of A if you still need to use it afterwards.

2 Likes

I guess speed up would be significant, especially if lots of garbage collections are needed for running non in-place versions many many times.

LinearAlgebra.svd!() allocates memory for the output, i.e., I could NOT pre-allocate the memory for output. It just reuse the memory of input for immediate calculations.

I’m confused to what you want exactly

1 Like

The reason this isn’t significant is that svd is slow. Specifically, for a 100x100 matrix, svd takes 1.5ms, while allocating arrays for the output is about 7us, and a gc run is in the 10ms range, but won’t be called frequently. As such, for a loop of running svd on a 100x100 matrix, I’m seeing .3% gc time on average. For smaller matrices, it can matter more, but I’m only seeing 1% gc time for 20x20 matrices, and if you get much smaller, you should be using StaticArrays anyway.

4 Likes

what about the complexity of LinearAlgebra.eigen() then? is it worth to develop in-place version of it?
Thanks.

eigen is slower than svd, unless its a hermitian matrix. Probably @Oscar_Smith is correct for factorisations. For multiplying matrices, even though the multiplication is also O(N^3), while the required memory for the result is O(N^2), I do think it can make a difference, just because this O(N^3) has been so optimised (and has a much smaller prefactor than in the svd or eigen case).

It has been my experience that the Julia’s GC, despite certainly having been much improved, still has some overhead when a lot of big temporary arrays are involved, in comparison to Matlab or Python. But I would need to do more thorough benchmarking and find a good test case; it’s hard to carefully compare this in a way that excludes all other possible causes. It is however a fact that in TensorOperations.jl I added a package wide cache structure specifically for storing intermediate arrays in tensor contractions, and (provided they all fit in memory), this can improve run time with a significant fraction.

2 Likes

The use of ! in Julia is a little confusing.
For low level programmers functions which overwrite the input (In place) are also usually non allocating while in Julia that’s not the case.

It would be great to have an option for non allocating API for BLAS / LAPACK operations.
Especially in the context of advancement in StaticCompiler.jl.
If one wants to build a static / dynamic library using Julia, it would be great to be able to make sure it can be used with no allocations at all.

1 Like

If I recall correctly, the ! does not make any statements about allocations. It is merely a hint that a function modifies its arguments.

2 Likes

Indeed, hence I said for people coming form low level programming. When you use functions which operate on the given buffers they don’t allocate.

Hence I wish for such API’s in Julia as well. My dream is to have an option to solve Linear System with zero allocations. Both for Sparse and Dense matrices.

Agreed that it would be nice to have.

On the other hand you could also code this on your own. The implementation of gesvd is quite straight forward and you can easily see, where the allocations happen.

If you still have this problem, you may want to check out KWLinalg
This is a package, in which (among other things) functors are provided that already include the necessary workspace and output matrices. Say you have a matrix

A = rand(ComplexF64, 5, 4)

Then you can define the functor

f = svd_functor_divconquer(5, 4, ComplexF64)

After that, you can call

U, S, Vt = f(A)

with no further allocations (unless you are in the main-scope). See the example in the README of KWLinalg for a benchmark-test.

3 Likes