Efficient lexicographic ordering of custom types?

greatpet · February 13, 2021, 2:05pm

Suppose I have the Julia type

struct Car
    model::String
    year::Int32
    miles::Int32
end

I’d like to overload the isless() function, to enable sorting variables of the type Array{Car, 1}. I would like to sort according to model first, then year, and lastly miles. One implementation is, using Julia’s built-in lexicographic ordering for Tuples,

import Base.isless
function isless(a::Car, b::Car)
    isless((a.model, a.year, a.miles), (b.model, b.year, b.miles))
end

But this involves memory allocation for two Tuples, which can slow down performance when sorting a large array. An alternative implementation is

import Base.isless
function isless(a::Car, b::Car)
    if isless(a.model, b.model)
        return true
    elseif isequal(a.model, b.model)
        if isless(a.year, b.year)
            return true
        elseif isequal(a.year, b.year)
            return isless(a.miles, b.miles)
        else
            return false
        end
    else
        return false
    end
end

This version should be more efficient, but is very tedious. I gets even worse if the type Car has more than three data fields. Is there a solution that is both efficient and concise?

lmiq · February 13, 2021, 2:11pm

Could be like this? Of course the order of evaluation of the fields should be specified somwhere, here
is just the order they appear in the struct.

julia> struct Car
           model::String
           year::Int32
           miles::Int32
       end

julia> fieldnames(Car)
(:model, :year, :miles)

julia> function isless(c1::Car,c2::Car)
         for field in fieldnames(Car)
           if getfield(c1,field) < getfield(c2,field)
             return true
           elseif getfield(c1,field) > getfield(c2,field)
             return false
           end
         end
         return false
       end
isless (generic function with 1 method)

julia> c1 = Car("Fusca",1975,10_000)
Car("Fusca", 1975, 10000)

julia> c2 = Car("Chevete",1985,20_000)
Car("Chevete", 1985, 20000)

julia> isless(c1,c2)
false

julia> isless(c2,c1)
true

julia>

Edit: getfield, if I am not mistaken, is a inherently type-unstable function. Thus, if this is going to be critical for performance, other alternatives are probably better, even if less general.

You may need to fall into this, at the end: Macro to write function with many conditionals - #8 by Skoffer

stevengj · February 13, 2021, 3:40pm

No it doesn’t. Tuples that are only used as local variables are typically not explicitly allocated:

julia> a = b = Car("Honda CRV", 2008, 173234)
Car("Honda CRV", 2008, 173234)

julia> using BenchmarkTools

julia> @btime isless($a, $b);
  14.624 ns (0 allocations: 0 bytes)

Skoffer · February 13, 2021, 4:09pm

Your first instinct was correct. You should use tuples, which are not allocated in this case (you can verify it with @code_native). Even better, tuples comparison is written in a more performant manner compared to your implementation of isless

struct Car
    model::String
    year::Int32
    miles::Int32
end

import Base: isless

isless(a::Car, b::Car) = isless((a.model, a.year, a.miles), (b.model, b.year, b.miles))

struct Car2
    model::String
    year::Int32
    miles::Int32
end

function isless(a::Car2, b::Car2)
    if isless(a.model, b.model)
        return true
    elseif isequal(a.model, b.model)
        if isless(a.year, b.year)
            return true
        elseif isequal(a.year, b.year)
            return isless(a.miles, b.miles)
        else
            return false
        end
    else
        return false
    end
end

# Data preparation
using Random
models = [randstring('A':'Z', 6) for _ in 1:10]
cars = [(rand(models), rand(2000:2020), rand(5:10:250)*1000) for _ in 1:10_000]
cars1 = [Car(x...) for x in cars]
cars2 = [Car2(x...) for x in cars]

# Comparison
using BenchmarkTools
julia> @btime sort($cars1);
  1.777 ms (4 allocations: 234.59 KiB)

julia> @btime sort($cars2);
  1.909 ms (4 allocations: 234.59 KiB)

lmiq · February 13, 2021, 4:18pm

Is it possible to expand all the fields of a struct into a tuple to avoid writing them explicitly in such a case, and write a generic code?

Topic		Replies	Views
Structs with custom ordering in a heap New to Julia performance , sort , data_structures	2	1394	September 14, 2019
Meaning of `isless`, `<`, `cmp`, `isequal`, `==`, total order, partial order, unordered Internals & Design	24	2283	January 7, 2018
Help me implement a sorting method New to Julia sort	6	675	April 10, 2019
Sorting strings containing numbers, so that "A2" < "A10"? General Usage strings , sort	28	6942	August 16, 2021
Sorting custom types New to Julia sort	14	3224	March 23, 2018

Efficient lexicographic ordering of custom types?

Related topics