Running simple loop in parallel

I’m not sure what that does.

But threading only the outer loop usually better. I tried threading on the OP’s code with BenchmarkTools and found no difference in speed using all four cores. You need something in the loop that uses more cpu time per iteration.