I understand this is inevitable for matricies with float values. But for Matricies with integer and rational values it should be possible to calculate the rank reliably. While I understand this would likely be more computationaly expensive, having confidence in the answer would surely be worth it in some use cases.
Maybe I missed such a method. If not, do you think it would be a good idea for a pull request to LinearAlgebra.jl or even base, where rank was moved?
The boring answer to your question “how reliable is rank(A)” is given by
help?> rank
search: rank RankDeficientException lowrankupdate lowrankupdate! lowrankdowndate lowrankdowndate! readlink ReentrantLock rand range randn transcode transpose Transpose transpose! LinRange UnitRange StepRange StepRangeLen
rank(A::AbstractMatrix; atol::Real=0, rtol::Real=atol>0 ? 0 : n*ϵ)
rank(A::AbstractMatrix, rtol::Real)
Compute the rank of a matrix by counting how many singular values of A have magnitude greater than max(atol, rtol*σ₁) where σ₁ is A's largest singular value. atol and rtol are the absolute and relative tolerances,
respectively. The default relative tolerance is n*ϵ, where n is the size of the smallest dimension of A, and ϵ is the eps of the element type of A.
It appears that, in your example, the result does indeed conform to the docs: The singular values are of order 1e8 and 1e-8 and eps(1.0) is of order 1e-16.
Now you are making a good case that this is maybe not ideal for non floating-point numbers.
Interesting question. You could write an algorithm for computing rank of integer or rational matrices using QR or LU decomposition and exact arithmetic, but for any finite representation, the internal arithmetic would still be subject to overflow. You’d get similarly “unreliable” answers for easily constructed pathological examples.
And like floating-point rank, this would happen maybe more than one would expect, for example, as the denominators of rationals got larger and larger under iterated arithmetic, until they eventually overflowed the integer type used for the rationals.
EDIT: Not QR, since the normalization for Q requires square roots.
This is the only thing that it’s really practical for the standard library to compute. Exact computations of rank for integer and rational matrices required algorithms that have much worse complexity, are mainly useful for number theory and similar areas, and probably belong in a specialized package.
If you want to simply apply reduced row echelon (rref) form (i.e. Gaussian elimination), then you will need to use arbitrary-precision arithmetic rational numbers and the cost will grow much faster than n^3 (the cost in fixed precision) with the size of the matrix. (In practice, virtually no one uses rref form outside of first-year linear-algebra classes doing hand calculations for tiny matrices.)
There are apparently more specialized algorithms to compute exact ranks of integer matrices that are better than naive rational-arithmetic Gaussian elimination, but they are still worse than n^4 complexity and still require arbitrary-precision arithmetic. A package is the right place for such things.
(I don’t think people would be happy if rank for a 1000x1000 matrix suddenly got many orders of magnitude slower when the matrix had integer entries. Whereas if you use some package ExactRank.jl with a function exactrank that documents its complexity, then you know what you are getting.)
We do use this for det(::Matrix{BigInt}). It’s not reasonable to use this for other integer types, however, because it will quickly overflow any fixed precision.
In principle, we could do something similar to compute the rank of BigInt matrices, but the user would still have to explicitly convert Int matrices to BigInt. (The frustrating thing here is that the output of rank is a small integer, unlike det, but arbitrary precision is still required for intermediate computations, and that’s something we would never want to do without being explicitly requested.)
If there is a simple exact algorithm for rank(::AbstractMatrix{BigInt}) that has reasonable complexity, e.g. based on the Bareiss algorithm, it might be worth including in Base.