Recently when working on sorting data with Hilbert curves, I noticed something very peculiar when benchmarking.
function hilbert_sort(m::AbstractMatrix{<:Integer}, b::Integer; rev::Bool = false)
return m[:, hilbert_sortperm(m, b; rev)]
end
function hilbert_sort!(m::AbstractMatrix{<:Integer}, b::Integer; rev::Bool = false)
return Base.permutecols!!(m, hilbert_sortperm(m, b; rev))
end
function hilbert_sortperm(m::AbstractMatrix{<:Integer}, b::Integer; rev::Bool = false)
h = mapslices(__with_bit(b), m; dims = 1) |> vec
return sortperm(h; rev)
end
function hilbert_sortperm!(m::AbstractMatrix{<:Integer}, b::Integer; rev::Bool = false)
h = eachcol(m) .|> __with_bit(b)
return sortperm(h; rev)
end
Where __with_bit is a function factory that returns an expensive-ish computation function (Skillingβs method returning a distance) that mutates. Naturally, I would expect the performance of hilbert_sortperm! > hilbert_sortperm > hilbert_sort! > hilbert_sort. However, the result is the last I would expect:
Large m benchmark
julia> m = rand(Bool, (10, 1000))
10Γ1000 Matrix{Bool}:
[...]
julia> @benchmark hilbert_sort($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 56.600 ΞΌs β¦ 16.335 ms β GC (min β¦ max): 0.00% β¦ 99.36%
Time (median): 62.400 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 68.468 ΞΌs Β± 163.403 ΞΌs β GC (mean Β± Ο): 2.37% Β± 0.99%
βββββββ
β
ββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββ β
56.6 ΞΌs Histogram: frequency by time 116 ΞΌs <
Memory estimate: 33.89 KiB, allocs estimate: 23.
julia> @benchmark hilbert_sort!($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 26.300 ΞΌs β¦ 926.400 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 29.000 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 30.798 ΞΌs Β± 11.887 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββ
ββββββββββ β
ββββββββββββββββββββββββββββ
ββββββββ
ββββββββββββββββββββ
β
βββ β
26.3 ΞΌs Histogram: log(frequency) by time 68.2 ΞΌs <
Memory estimate: 16.11 KiB, allocs estimate: 17.
julia> @benchmark hilbert_sortperm($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 52.600 ΞΌs β¦ 15.688 ms β GC (min β¦ max): 0.00% β¦ 99.41%
Time (median): 56.900 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 61.121 ΞΌs Β± 156.932 ΞΌs β GC (mean Β± Ο): 2.55% Β± 0.99%
β
βββββ
ββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ β
52.6 ΞΌs Histogram: frequency by time 103 ΞΌs <
Memory estimate: 23.99 KiB, allocs estimate: 20.
julia> @benchmark hilbert_sortperm!($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 72.700 ΞΌs β¦ 11.514 ms β GC (min β¦ max): 0.00% β¦ 99.06%
Time (median): 76.700 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 80.416 ΞΌs Β± 115.160 ΞΌs β GC (mean Β± Ο): 1.42% Β± 0.99%
βββ
ββ
ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
72.7 ΞΌs Histogram: frequency by time 119 ΞΌs <
Memory estimate: 23.65 KiB, allocs estimate: 9.
While the performance and allocation of hilbert_sort! beats hilbert_sort as expected, the rest is puzzling.
Although hilbert_sortperm! allocates 9 times instead of 20 times as in hilbert_sortperm, the memory allocation is roughly the same, and the performance is drastically worse. When I tested with smaller m, this result is then reversed:
Small m benchmark
julia> m = rand(Bool, (10, 10))
10Γ10 Matrix{Bool}:
[...]
julia> @benchmark hilbert_sortperm($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 159 evaluations per sample.
Range (min β¦ max): 664.780 ns β¦ 303.486 ΞΌs β GC (min β¦ max): 0.00% β¦ 99.56%
Time (median): 782.390 ns β GC (median): 0.00%
Time (mean Β± Ο): 857.347 ns Β± 4.029 ΞΌs β GC (mean Β± Ο): 7.89% Β± 1.72%
β
βββββ
β
ββββββββββββββ
βββ
ββββββββββββββββββββββββββββββββββββββ β
665 ns Histogram: frequency by time 1.21 ΞΌs <
Memory estimate: 640 bytes, allocs estimate: 15.
julia> @benchmark hilbert_sortperm!($(copy(m)), $1)
BenchmarkTools.Trial: 10000 samples with 258 evaluations per sample.
Range (min β¦ max): 296.899 ns β¦ 164.754 ΞΌs β GC (min β¦ max): 0.00% β¦ 99.65%
Time (median): 366.279 ns β GC (median): 0.00%
Time (mean Β± Ο): 398.245 ns Β± 1.924 ΞΌs β GC (mean Β± Ο): 6.62% Β± 1.41%
βββ
ββ
βββ β
βββββ
β
βββββββββ β
βββββββββββββββββββββββββββββββββ
β
β
β
β
βββββ
ββββββββββββ
β
β
βββββ
β
297 ns Histogram: log(frequency) by time 621 ns <
Memory estimate: 288 bytes, allocs estimate: 4.
Apart from that, hilbert_sort!, which calls hilbert_sortperm, somehow beats the latter in both performance and allocation. I tried using JET to detect type instabilities but no errors. I also tried using a simplified scenario:
function some_sort!(m::AbstractMatrix{<:Integer})
return Base.permutecols!!(m, some_sortperm(m))
end
function some_sortperm(m::AbstractMatrix{<:Integer})
return mapslices(example!, m; dims = 1) |> vec |> sortperm
end
function example!(x::AbstractVector{<:Integer})
for _ in 1:10 # expensive-ish
x .β»= x[end]
end
return sum(x)
end
But cannot replicate the peculiar result:
Simplified benchmark
julia> m = rand(Bool, (10, 1000))
10Γ1000 Matrix{Bool}:
[...]
julia> @benchmark some_sort!($(copy(m)))
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 66.900 ΞΌs β¦ 417.600 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 68.400 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 70.269 ΞΌs Β± 10.250 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββββββββ β
βββββββββββββββββββββββ
β
β
ββ
β
β
βββββ
ββββββββββββ
ββββββββ
β
βββ
β
β β
66.9 ΞΌs Histogram: log(frequency) by time 105 ΞΌs <
Memory estimate: 16.25 KiB, allocs estimate: 19.
julia> @benchmark some_sortperm($(copy(m)))
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 64.100 ΞΌs β¦ 922.100 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 68.000 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 70.893 ΞΌs Β± 18.556 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β
ββββββββββββββ β
ββββββββββββββββββββ
β
β
β
ββ
β
ββββ
βββββββββββββββββββββββ
β
ββββββ
β
64.1 ΞΌs Histogram: log(frequency) by time 120 ΞΌs <
Memory estimate: 16.25 KiB, allocs estimate: 19.
This really left me puzzled. I am still new to Julia so any help would be greatly appreciated!