Performance optimization on lots of small linear algebra operations

I don’t know if this will be helpful, but several years ago I got some fantastic help for what seems to be at least tangentially related to what you’re doing – so perhaps you’ll find this helpful: