I think that’s a good idea. To be fair, I should probably mention that there was slowdown in some cases as well, especially when the first matrix is much wider than it is tall. The decision on how to vectorize needs to be carefully calibrated. The code would probably look something like
if some_condition_on_sizes_and_element_types
mul_as_Mat(A,B)
elseif some_other_condition
mul_as_Mat(transpose(B), transpose(A)) |> transpose
else
mul_as_SMatrix(A,B)
end
It looks like the second of the two needed PRs is on the verge of being merged into Julia, so this will all work in nightly builds very soon!