Tolerance for numerical-rank computation

Hi all! Sorry to revive this but what is the right tolerance to use when using pivoted cholesky to compute rank? I find that using sqrt(eps(Float64)) usually gives me the same rank as the default rank method for Matrices with Float64 elements. Is that good enough?

Your question is ill-posed. What do you mean by “right” or “good enough?”

Realize that what you are computing is not the rank of a matrix, but a “numerical rank”. For example, with the default SVD-based rank(A), algorithm, it counts the number of singular values above some threshold. What threshold you set will depend on your application.

It is not normally practical to compute the “exact” rank of a matrix because floating-point roundoff errors make it nearly impossible to distinguish a singular value that is tiny from one that is zero.

If you want to know what tolerance should you set in a Cholesky-based rank computation (or some other algorithm, like QR-based rank) to guarantee that it matches the result from SVD-based rank computation, I don’t think there is any straightforward answer. Even the same algorithm may give different rank values on different machines or with different compilers due to slight differences in roundoff errors.

Certainly, a tolerance sqrt(eps()) will give a different answer from rank(A) for some nearly singular matrices, because the default rank tolerance is much smaller: min(size(A)...)*eps(). But even if you set the “same tolerance” you will often get somewhat different answers from different rank algorithms.

5 Likes

Thanks for your response. Sorry for the ill-posed question. I guess this is a better one: how do I go about choosing an appropriate tolerance for my application? In my specific case I’m trying to compute a quadratic form x' Sinv x where Sinv is the inverse of a symmetric matrix S that itself is a function of theta. For some input thetas that maybe positive semidefinite (PSD) while for most inputs it is positive definite. I found a paper that shows how to compute the geberalized inverse for a PSD using a pivoted Choleky decomposition. In case S is PSD I have code to compute the quadratic form using the papers definition of genralized inverse, and I verified it matches with using a an SVD based computation of the quadratic form. In case S is PD I just use sum(abs2, ldiv(L,x))

When S is singular, this is generally infinite. Why is this a meaningful thing to compute? Do you need some sort of regularization?

I’m not sure what you mean by “generalized inverse”. Do you mean a pseudo inverse? That should be similar in effect to a Tikhonov (“ridge”) regularization, but the latter is much easier to compute: just replace S with S+\lambda I for some regularization parameter \lambda > 0.

What regularization (if any) is appropriate really depends on your problem, and you haven’t given enough context for us to say much. (The issue of regularization is closely related to tolerance for numerical rank.)

By generalized inverse I mean Moore Penrose pseudo inverse. I was using terminology from MASS package in R (https://www.rdocumentation.org/packages/MASS/versions/7.3-61/topics/ginv. I’m reimplementing an R package in Julia, and the original R package computes the quadratic form as x’ ginv(S) x. I am still investigating what values of theta make the S matrix PSD instead of positive definite. It is possible that it only happens on the boundary of the simplex on which theta resides. It’s hard to investigate because this happens inside an optimization problem.

So, you are minimizing the quadratic form or something like that? So that the optimum S should be far from singular, and you just want the optimizer not to break if you stray into a region of parameter space where S is singular?

In that case, I probably would just use a Tikhonov regularization. You’ll have to tune things a little bit to get a reasonable regularization parameter \lambda (you might want to make it proportional to norm(S) so that the coefficient is dimensionless) for your problem, but that shouldn’t be too hard.

Using a pseudo-inverse (pinv(S) in Julia) corresponds to simply dropping singular values when they get too small (and again, you have to choose a tolerance — no free lunch!), which introduces a discontinuity into your optimization problem that may cause trouble for an optimizer. In contrast, a Tikhonov regularization smoothly suppresses singular values that are smaller than \lambda, and hence should be better behaved for optimization.

1 Like

This is super useful thanks! The actual problem is of the form: x(theta)’ * inv(S(theta)) * x(theta) which has to be minimized. I’ll look into Tikhonov regularization.

That’s essentially what the R package does. Uses a tolerance to decide which singular values to drop

Just to confirm, your suggestion is to solve:

\arg\min_{\theta} \left\{\ x(\theta)^\top \; [S(\theta) + \lambda I]^{-1}\; x(\theta)\ \right\}

for an appropriate penalty \lambda, right?

Right.

1 Like