Did you notice any particular slowdowns? Do you have an example for your matrix-vector product?
(Though I don’t think this is necessarily needed here.)