GenericLinearAlgebra doesn’t do any of the blocking or repacking that is necessary to achieve high performance (even for slower datatypes). An ideal version would be able to do all performance tricks that still matter while keeping the genericness.
Also, it’s probably worth implementing sub-cubic algorithms since they do better when multiplication is more costly than addition which is true for most of the more complex number types.
Blocking/repacking/re-ordering only matters for problems that are memory-bound (for simpler implementations). With arbitrary-precision arithmetic, most of these algorithms should be compute-bound, in which case you might as well use LINPACK/EISPACK-style “textbook” triple-loop algorithms.
With arbitrary precision, you are probably right, but I bet that for something like DoubleDouble the packing still matters. Also, as the blocking becomes less important, the sub-cubic methods become more important, so fanciness of some kind is still necessary for good performance.