I edited the code in the opening post. As @sijo suggested, I replaced Vector{Matrix{tw}} with ˙Tuple{Vararg{Matrix{tw}}}˙ and in the unicore code, the allocations are now indeed minimal. This is not the case with the multithreaded code.
@foobar_lv2 Technically, floats do not constitute a field, but they are an approximation for the field of real/complex numbers, so I count them in.
My most important fields that I have in mind: Mod{p} \approx \mathbb{Z}_p, Rational{Int} \approx \mathbb{Q}, Float64 \approx \mathbb{R}, ComplexF64 \approx \mathbb{C}, \mathbb{Z}_{(p)} (localized ring, as a subfield of \mathbb{Q}), Rational{Polynomial{Mod{p},:t}} \approx \mathbb{Z}_p(t), … Perhaps the most important case for me at the moment is \mathbb{Z_2}.
Rank is certainly not the only invariant needed. In homological algebra, I don’t have just one matrix but a whole diagram of them, and I need to simultaneously do row/column operations on them.
@mstewart Before I start implementing the LU decomposition over a generic field, may I please clear up the outline of your proposed solution. Will it be possible to, given X,U,V,Ui,Vi, where X is rectangular, mutate X into a diagonal matrix, and correspondingly edit U,V,Ui,Vi?
How would this be achieved? I don’t understand your proposal. After I construct L and U, how do I get to the diagonal matrix?
My use-case is that I have a chain complex. I need to diagonalize the boundary matrices one by one, and mutate the accompanying homotopy equivalences…