Skoffer
February 28, 2021, 2:47pm
1
Maybe I am benchmarking it wrong, but it seems that sorting part of the vector (with the help of @views
) is slower than sorting original vector.
using BenchmarkTools
v = rand(1000);
v2 = rand(2000);
v2[500:1499] .= v;
and benchmarks itself
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
12.722 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
22.854 μs (0 allocations: 0 bytes)
Is there any way to exploit current algorithms for sorting part of the array? It seems wrong to copy-paste original algorithms and add offset to all indices.
Shuhua
February 28, 2021, 3:12pm
2
The difference is quite small in my test
julia> @benchmark sort!(vv) setup = (vv = deepcopy($v))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 8.850 μs (0.00% GC)
median time: 9.100 μs (0.00% GC)
mean time: 9.346 μs (0.00% GC)
maximum time: 21.575 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 4
julia> @benchmark sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499])
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 9.850 μs (0.00% GC)
median time: 10.275 μs (0.00% GC)
mean time: 10.562 μs (0.00% GC)
maximum time: 20.925 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 4
julia> versioninfo()
Julia Version 1.6.0-beta1.0
Commit b84990e1ac (2021-01-08 12:42 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.0 (ORCJIT, sandybridge)
Skoffer
February 28, 2021, 4:02pm
3
Even in this example it’s more than 10% difference, but it shouldn’t exists at all, cause it’s literally same algorithm over the same data, but with different offset.
I can do it manually though, but I do not like this approach much.
import Base: getindex, size, setindex!
struct LinearSubArray{T} <: AbstractVector{T}
v::Vector{T}
end
struct LinearSubArray{T} <: AbstractVector{T}
v::Vector{T}
offset::Int
len::Int
end
getindex(x::LinearSubArray, i) = @inbounds x.v[i + x.offset - 1]
setindex!(x::LinearSubArray, v, k) = @inbounds x.v[k + x.offset - 1] = v
size(x::LinearSubArray) = (x.len, )
v = rand(1000);
v2 = rand(2000);
v2[500:1499] .= v;
lv = LinearSubArray(v2, 500, 1000);
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
13.110 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv = deepcopy($lv)) evals = 1;
12.037 μs (0 allocations: 0 bytes)
Weirdly enough it is even slightly faster (maybe because length
can be elided?). Still, I would prefer more canonical way of solving it.
Shuhua
March 1, 2021, 9:20am
4
I just tested it on a different PC with Julia 1.5.3. The new result indicates no significant difference at all.
julia> @benchmark sort!(vv) setup = (vv = deepcopy($v))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.400 μs (0.00% GC)
median time: 7.860 μs (0.00% GC)
mean time: 8.059 μs (0.00% GC)
maximum time: 28.000 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5
julia> @benchmark sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499])
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.580 μs (0.00% GC)
median time: 7.900 μs (0.00% GC)
mean time: 8.096 μs (0.00% GC)
maximum time: 17.300 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, ivybridge)
Environment:
JULIA_NUM_THREADS = 4
Ah, you are right it looks like it is a regression.
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
17.655 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
11.901 μs (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.5.0-rc1.0
Commit 24f033c951 (2020-06-26 20:13 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
And
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
12.733 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
15.665 μs (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.7.0-DEV.538
Commit 85354cf8ef (2021-02-15 11:47 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
Thank you, I’ll report an issue.
Ah! Here is the important thing, you didn’t add evals = 1
in the end, so you divided time difference by 5 (since in your benchmarks you have 5 evals per sample). Actually, it shrank even more than by factor 5, since sorting sorted array takes less time.
Shuhua
March 1, 2021, 11:09am
7
julia> @benchmark sort!(vv) setup = (vv = deepcopy($v)) evals=1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 12.200 μs (0.00% GC)
median time: 13.101 μs (0.00% GC)
mean time: 13.640 μs (0.00% GC)
maximum time: 53.300 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals=1
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 12.300 μs (0.00% GC)
median time: 13.299 μs (0.00% GC)
mean time: 13.860 μs (0.00% GC)
maximum time: 53.300 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
This is interesting. Anyway here is an issue https://github.com/JuliaLang/julia/issues/39864 , so let’s wait for its resolution.
Shuhua
March 1, 2021, 11:17am
9
The sort!
method uses quick sort by default. Why does it work faster for a sorted array? Is there any specific implementation detail in Julia?
A useful Wiki link
In the very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended b Med...
I do not think it is related to implementation or algorithm. It’s just the fact that in the case of sorted array you are not doing permutations, only comparisons. Less operations → faster timing.
Shuhua
March 1, 2021, 11:25am
11
That’s a clue. I recall that, in the undergraduate course, we usually say that naive quick sort displays the worst performance when the input array is already sorted. Now it becomes faster instead with the “median-of-three” pivot rule (see Wiki above).
In fact, the time difference is huge:
julia> arr = randn(10000);
julia> @btime sort!(a) setup=(a=copy($arr)) evals=1;
377.399 μs (0 allocations: 0 bytes)
julia> sort!(arr); # make arr sorted now
julia> @btime sort!(a) setup=(a=copy($arr)) evals=1;
66.399 μs (0 allocations: 0 bytes)
1 Like
These identical timings are suspicious, may be you occasionally sorted the arrays at some point prior to benchmarking? Can you try the following in fresh REPL
using BenchmarkTools
using StableRNGs
rng = StableRNG(2021);
v = rand(rng, 1000);
v2 = rand(rng, 2000);
v2[500:1499] .= v;
@btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
@btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
StableRNGs
makes results reproducible between different OS and Julia versions so it’s a more fair comparison.
Shuhua
March 1, 2021, 11:55am
13
Just did as you suggested.
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
11.600 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
12.300 μs (0 allocations: 0 bytes)
# run a second time
julia> @btime sort!(vv) setup = (vv = deepcopy($v)) evals = 1;
11.801 μs (0 allocations: 0 bytes)
julia> @btime sort!(vv) setup = (vv1 = deepcopy($v2); vv = @view vv1[500:1499]) evals = 1;
12.399 μs (0 allocations: 0 bytes)
There is indeed some difference.
PS: first time knowing StableRNGs, it is cool.
1 Like