Bring Intel x86 simd sort library to Julia

In the original post, it was explained quite clearly why AVX2 performance was bad in Julia