First, it is all about the overhead. For normal operations using different threads has too much overhead and doesn’t help speed up.
If you go to my repository there are two branches that are for two completely different systems and they both show the same result.
Second, 7 times is very low, susceptible to noise. You should run the function many more times.
In my code for Julia, BenchmmarkTools takes 700 samples and their median is calculated. Additionally, I repeat the process 4 times and average those medians
For Matlab, timeit is used which is the function Matlab recommends for benchmarking. Internally, it calls the function many times. Again, I repeat the process 4 times and average those medians.
Also, when you measure the time like this, there is a big chance that the operation that you are interested in, is not optimized and compiled! :