How to speed up this simple code? Multithreading, simd, inbounds