On a machine with AVX2 I can confirm numpy.sort
is faster (but definitely not 10x):
julia> using BenchmarkTools, PyCall
julia> numpy = pyimport("numpy");
julia> a = randn(Float32, 10^8);
julia> apy = pycall(numpy.array, PyObject, a); # convert to npy array
julia> @btime sort($a); @btime pycall($(numpy.sort), PyObject, $apy);
2.611 s (14 allocations: 850.76 MiB)
954.375 ms (1 allocation: 16 bytes)
julia> numpy.__version__
"2.1.3"
julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
There’s a big difference between numpy v1 and v2 (the latter being much faster). Looking at htop I see only a single core being used all time.