Inverse with destination?

I think it would help if you could write down your whole regression problem in mathematical form, including the form of the error you’re trying to compute. At this stage it seems people are guessing at what you’re trying to do :wink:

Perhaps most importantly - what are you going to do with the covariance matrix of the parameters? If it’s high dimensional will you be summarizing it in some way? Perhaps this can be done without actually computing the matrix itself?

Not that I’m necessarily against having an inv!(dest, src).

1 Like

There is a trade-off between the maintenance cost of supporting in-place variants vs their usefulness (recall that these are generic functions, and need to work for many types). Given that most people think that even inv is rarely used, I doubt this would be a good choice.

Note that even inv! is not part of the API (it is neither exported, nor documented), and thus technically an internal function.

The cost of creating the array should in most cases be completely dwarfed by the cost of doing the factorization.

2 Likes

Not always. For example, when A is tridiagonal, its LU decomposition is bidiagonal, but the inverse can be dense. At least in this case it would help to have inv!(dest, A) with dest preallocated.

In many cases the LU decomposition of a sparse matrix is also sparse. But since the inverse dest will be dense, it can help to have it pre-allocated.

Perhaps you can show an example benchmarking the time to do the factorization vs copying the output matrix.

1 Like

It’s about memory, not time. I want to avoid allocating space for the inverse matrix twice.

Please note that I’m not asking about an in-place inv!(A).
I just want to provide the pre-allocated space for the inverse in inv!(dest, A), where dest is a different array from A.

1 Like

Look at the primitives in LinearAlgebra.LAPACK. Note that you still need to pre-allocate a temporary buffer for the pivot indices if you want to use LU.

Also note that for the particular problem we discussed above, this

  1. entirely unnecessary,

  2. the low hanging fruit is avoiding formation of the covariance matrix, which will save you much, much more than simply optimizing the wrong algorithm.

I wish could. But how else can I estimate the errors of the regressed parameters? (Maybe I’m misunderstanding something you said before).

If you have a triadiagonal matrix, then you can compute the diagonal elements of the inverse in O(n²) work and O(n) auxiliary storage simply by (1) compute the LU (or LDLᵀ or Cholesky) factorization F in O(n) work and storage, (2) solving F \ eₖ one at a time, where the eₖ are the columns of the identity matrix, picking out the diagonal, and the discarding the rest of the vector.

In fact, you can get the whole diagonal computation for A⁻¹ down to O(n) work and O(n) memory: see Rybicki and Hummer (1990). (There are even fast algorithms to compute the diagonal of the inverse of general spd matrices, but they are much more complicated.)

In general, it seems pretty rare to need the whole inverse matrix explicity — you almost always want just some function of the inverse, e.g. selected elements or solves for selected right-hand-sides. (It’s still convenient to compute the inverse explicitly sometimes when you are exploring a problem, of course, even if you probably want to optimize that out of production code.)

4 Likes

That’s equivalent to computing the inverse, isn’t it?

Edit: Well, except that discarding the rest of the vector as you said, saves lots of memory.

First, note that for WLS, it is better to work with

\tilde{X} = UX\\ \tilde{Y} = UY

where W = U'U using a Cholesky decomposition. This reduces a lot of your formulas to the OLS case, so let’s talk about that.

Either QR or SVD is used. Eg

\tilde{X} = QR

so

(\tilde{X}'\tilde{X})^{-1} = (R'R)^{-1} = R^{-1}{R^{-1}}'

and similar in formulas.

Third, you would never form even these, just keep track parts of the same (ie either QR or SVD) decomposition throughout the calculations.

This is pretty standard and you can find the details in most texts on OLS. Or just search for “QR least squares” or similar.

4 Likes

@stevengj @Tamas_Papp Thanks.

1 Like

I see a lot of logic in adding to (Every?) functions the ability to provide them the buffers needed to avoid inner allocation of memory.

It is wise both performance wise (Though indeed in many case it is order of magnitude less) and on some cases (Embedded?) it might be a constraint.

So I think while it is great there are many people with knowledge on Weighed and Ordinary Least Squares here, the feature request makes a lot of sense.

It is tricky to add this to the API for every function, since it does not make sense for all types (some may not even be mutable, or don’t benefit from preallocation).

There is a good reason for these methods being in the low- or mid-level implementation, since they are not really generic, and the linear algebra API is.

Of course my remark about every was taking the idea to extreme.
In my opinion on of Julia’s strength is being able to have low level control.
One of the things is being able to define buffers for functions.

So I’d vote for having API to provide buffers for low level Linear Algebra operations.

I don’t see why your comment is valid when talking about Linear Algebra.

1 Like

wouldn’t it work by

N=3; dest = zeros(N,N); p1 = pointer(dest); dest .= inv(rand(N,N)); p2 = pointer(dest); p1==p2

? It returns true.

(Sorry about it if I’m talking nonsense - just started learning Julia)


update: sorry there is still memory allocation happening inside, even the pointer remains unchanged.