The way @RoyiAvital measures the performance (using tic/toc and @elapse) is not reliable. Refer to my benchmarks for a more accurate result.
Windows - System 1:
#Julia vs Matlab
Ubuntu - System 2:
#Julia vs Matlab
Also, the overhead of using 6 BLAS threads is not reasonable for these normal operations