It’s changed…in that PaddedMatrices no longer supports multi-threading.
I may add multithreaded support again, but that would probably be motivated by the same “proof of concept” desire of seeing how efficient I could get it, with gold standards to optimize against.
MKL’s multithreading is extremely efficient, and also kicks in at a very small size. Importantly, it also helps when it does, unlike OpenBLAS which tends to trip over its own threads.