I’m seeking tips on how to improve performance of struct comparisons where the specific comparison rule is selected at runtime.
I have a struct with n
fields, and I’d like to sort by each field in either order.
Performance with a hardcoded rule is 45x faster than my existing dynamic strategy (benchmarks below), so I’m thinking it could be worthwhile to use precompiled rules for all dynamic possibilities. There are n! * 2^n
possible ordering rules, so this might become unwieldy. The example below uses n=3
(48 rules). My application requires n=10
(3715891200 or 3628800 rules)[1].
I considered using the type system to generate these specialized rules, but this might expand to n^n * 2^n
rules, and seems cautioned against.
Generated functions and macros might be another path to success, but I’d like to follow the “don’t do metaprogramming” advice.
Maybe there is some middle-ground with a smaller performance penalty.
Two other other options I’m investigating are:
- Give the struct a FieldVector view and create a “comparison score” by multiplying with a dynamic array of weights of unique powers-of-2 and taking the sum.
- Create a dynamic array of comparison functions and loop through those. This requires
n
or2n
comparison functions.
I’ll post benchmark results of the above once they’re available, but figured it would be wise to ask for advice from more experienced community members in the meantime.
Here are benchmark results for the two techniques with a wide performance gap:
# ========= Struct Definition =========
using Random
using BenchmarkTools
struct MyStruct
a::Int64
b::Int64
c::Int64
end
function Random.rand(rng::AbstractRNG, ::Random.SamplerType{MyStruct})
MyStruct(
rand(rng, Int),
rand(rng, Int),
rand(rng, Int),
)
end
xs = rand(MyStruct, 1000)
# ============ Static Rule ============
function static_lt(a::MyStruct, b::MyStruct)
a.a > b.a && return true
a.a < b.a && return false
a.c > b.c && return true
a.c < b.c && return false
a.b < b.b && return true
a.b > b.b && return false
return false
end
julia> @benchmark sort!(a, lt=static_lt) setup=(a = copy($xs))
BenchmarkTools.Trial:
memory estimate: 11.89 KiB
allocs estimate: 2
--------------
minimum time: 37.459 μs (0.00% GC)
median time: 38.542 μs (0.00% GC)
mean time: 38.913 μs (0.00% GC)
maximum time: 67.398 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
# ========= Dynamic Rule ============
struct FieldOrder
sym::Symbol
ord::Base.Ordering
end
struct StructOrdering <: Base.Ordering
field_orderings::AbstractVector{FieldOrder}
end
function Base.lt(struct_ordering::StructOrdering, a, b)
for field_ordering in struct_ordering.field_orderings
sym = field_ordering.sym
ord = field_ordering.ord
va = getproperty(a, sym)
vb = getproperty(b, sym)
Base.lt(ord, va, vb) && return true
Base.lt(ord, vb, va) && return false
end
false # a and b are equal
end
dynamic_order = StructOrdering([
FieldOrder(:a, Base.Reverse),
FieldOrder(:c, Base.Reverse),
FieldOrder(:b, Base.Forward),
])
julia> @benchmark sort!(a, order=$dynamic_order) setup=(a = copy($xs))
BenchmarkTools.Trial:
memory estimate: 1.71 MiB
allocs estimate: 80248
--------------
minimum time: 1.686 ms (0.00% GC)
median time: 1.707 ms (0.00% GC)
mean time: 1.899 ms (8.83% GC)
maximum time: 50.929 ms (96.21% GC)
--------------
samples: 2622
evals/sample: 1
# =========== Equivalent Dynamic Rule ===========
# This dynamic technique could also be passed to sort's `lt`
function dynamic_lt(struct_ordering::StructOrdering)
function (a::MyStruct, b::MyStruct)
Base.lt(struct_ordering, a, b)
end
end
julia> @benchmark sort!(a, lt=$(dynamic_lt)($dynamic_order)) setup=(a = copy($xs))
BenchmarkTools.Trial:
memory estimate: 1.71 MiB
allocs estimate: 80250
--------------
minimum time: 1.688 ms (0.00% GC)
median time: 1.714 ms (0.00% GC)
mean time: 1.901 ms (8.85% GC)
maximum time: 51.383 ms (96.25% GC)
--------------
samples: 2618
evals/sample: 1
[1] My use case uses a fixed order for each field. e.g. field a
always descending, field b
always ascending, which eliminates the 2^n
term to reduce the total possibilities to n!
.