Efficient lexicographic ordering of custom types?

Suppose I have the Julia type

struct Car
    model::String
    year::Int32
    miles::Int32
end

I’d like to overload the isless() function, to enable sorting variables of the type Array{Car, 1}. I would like to sort according to model first, then year, and lastly miles. One implementation is, using Julia’s built-in lexicographic ordering for Tuples,

import Base.isless
function isless(a::Car, b::Car)
    isless((a.model, a.year, a.miles), (b.model, b.year, b.miles))
end

But this involves memory allocation for two Tuples, which can slow down performance when sorting a large array. An alternative implementation is

import Base.isless
function isless(a::Car, b::Car)
    if isless(a.model, b.model)
        return true
    elseif isequal(a.model, b.model)
        if isless(a.year, b.year)
            return true
        elseif isequal(a.year, b.year)
            return isless(a.miles, b.miles)
        else
            return false
        end
    else
        return false
    end
end

This version should be more efficient, but is very tedious. I gets even worse if the type Car has more than three data fields. Is there a solution that is both efficient and concise?

2 Likes

Could be like this? Of course the order of evaluation of the fields should be specified somwhere, here
is just the order they appear in the struct.

julia> struct Car
           model::String
           year::Int32
           miles::Int32
       end

julia> fieldnames(Car)
(:model, :year, :miles)

julia> function isless(c1::Car,c2::Car)
         for field in fieldnames(Car)
           if getfield(c1,field) < getfield(c2,field)
             return true
           elseif getfield(c1,field) > getfield(c2,field)
             return false
           end
         end
         return false
       end
isless (generic function with 1 method)

julia> c1 = Car("Fusca",1975,10_000)
Car("Fusca", 1975, 10000)

julia> c2 = Car("Chevete",1985,20_000)
Car("Chevete", 1985, 20000)

julia> isless(c1,c2)
false

julia> isless(c2,c1)
true

julia>

Edit: getfield, if I am not mistaken, is a inherently type-unstable function. Thus, if this is going to be critical for performance, other alternatives are probably better, even if less general.

You may need to fall into this, at the end: Macro to write function with many conditionals - #8 by Skoffer

No it doesn’t. Tuples that are only used as local variables are typically not explicitly allocated:

julia> a = b = Car("Honda CRV", 2008, 173234)
Car("Honda CRV", 2008, 173234)

julia> using BenchmarkTools

julia> @btime isless($a, $b);
  14.624 ns (0 allocations: 0 bytes)
5 Likes

Your first instinct was correct. You should use tuples, which are not allocated in this case (you can verify it with @code_native). Even better, tuples comparison is written in a more performant manner compared to your implementation of isless

struct Car
    model::String
    year::Int32
    miles::Int32
end

import Base: isless

isless(a::Car, b::Car) = isless((a.model, a.year, a.miles), (b.model, b.year, b.miles))

struct Car2
    model::String
    year::Int32
    miles::Int32
end

function isless(a::Car2, b::Car2)
    if isless(a.model, b.model)
        return true
    elseif isequal(a.model, b.model)
        if isless(a.year, b.year)
            return true
        elseif isequal(a.year, b.year)
            return isless(a.miles, b.miles)
        else
            return false
        end
    else
        return false
    end
end

# Data preparation
using Random
models = [randstring('A':'Z', 6) for _ in 1:10]
cars = [(rand(models), rand(2000:2020), rand(5:10:250)*1000) for _ in 1:10_000]
cars1 = [Car(x...) for x in cars]
cars2 = [Car2(x...) for x in cars]

# Comparison
using BenchmarkTools
julia> @btime sort($cars1);
  1.777 ms (4 allocations: 234.59 KiB)

julia> @btime sort($cars2);
  1.909 ms (4 allocations: 234.59 KiB)
3 Likes

Is it possible to expand all the fields of a struct into a tuple to avoid writing them explicitly in such a case, and write a generic code?