Not for me, unfortunately:
julia> a = rand(4800,10000);
julia> BLAS.set_num_threads(4)
julia> @time cov(a);
11.101160 seconds (12 allocations: 1.103 GiB)
julia> BLAS.set_num_threads(1)
julia> @time cov(a);
10.656343 seconds (12 allocations: 1.103 GiB, 0.59% gc time)
I can only guess there is some issue with the BLAS call.
This is an interesting idea, thanks for the link. I believe this will not be faster here, because I expect that evaluating the matrix vector product once will likely have to actually compute each matrix element (though its probably a good idea to check this, perhaps there is a shortcut I am overlooking).