Inplace multiplication by a square matrix

Most of the time in your static versions is spent computing sines and cosines, not in matrix products. If you need to compute these every time, try a vector math library (e.g. Yeppp or MKL). SIMD conversions of integers to floats are also not always available - you might need to build a native system image to get that.
If you want this really fast, precompute the sine/cosine arrays, rearrange to make k the first index of Z, and go back to ordinary matrices - BLAS libraries have well-tuned code for this case.

1 Like