Shared-memory parallelization with large matrix

What kind of machine are you running this on? You’ll need cpu cores to run those threads.