Fast non dominated sorting

I need a fast implementation of the so-called non dominated sorting algorithm. The best I could come up with so far is the following (the sorting is to be done on the basis of three criteria):

dominates(x, y) = all(x .<= y) && any(x .< y)

function nonDominatedSorting(arr::Array{Float64,2})
    fronts::Array{Array,1} = Array[]
    ind::Array{Int64,1} = collect(1:size(arr, 1))
    while !isempty(arr)
        s = size(arr, 1)
        red = dropdims(sum([ dominates(arr[i,:], arr[j,:]) for i in 1:s, j in 1:s ], dims=1) .== 0, dims=1)
        a = 1:s
        sel::Array{Int64,1} = a[red]
        push!(fronts, ind[sel])
        da::Array{Int64,1} = deleteat!(collect(1:s), sel)
        ind = deleteat!(ind, sel)
        arr = arr[da, :]
    end
    return fronts
end

The timing is not too bad:

using BenchmarkTools

scores = rand(Float64, (50, 3));

@btime nonDominatedSorting($scores)
# 10.039 ms (31998 allocations: 21.81 MiB)

But I need this for simulations, so at the moment I use the faster

using RCall

function nds(arr::Array{Float64,2})
    @rput arr
    R"""
    library(nsga2R)
    ranking <- fastNonDominatedSorting(arr)
    """
    return @rget ranking
end

which gives

@btime nds($scores)
# 2.354 ms (227 allocations: 9.97 KiB)

That’s fine for what I want, but I’d still be happier to have a pure Julia implementation. So I’d be grateful for any suggestions for making the first version faster.

You might have more answers in https://discourse.julialang.org/c/user/perf (reference this post, if you post there).
Here is a first improvement:

julia> x = rand(Int, 100000);

julia> y = rand(Int, 100000);

julia> dominates(x, y) = all(x .<= y) && any(x .< y)

julia> dominates2(x, y) = all(i -> x[i] <= y[i], eachindex(x)) && any(i -> x[i] < y[i], eachindex(x))
dominates2 (generic function with 1 method)

julia> _time(f, x, y) = @time f(x, y) # When `@time` is called from global scope, it has a fixed cost in term of speed and allocation so we make this dummy _time function
_time (generic function with 1 method)

julia> _time(dominates, x, y)
  0.000118 seconds (3 allocations: 16.594 KiB)
false

julia> _time(dominates, x, y)
  0.000186 seconds (3 allocations: 16.594 KiB)
false

julia> _time(dominates2, x, y)
  0.000001 seconds
false

julia> _time(dominates2, x, y)
  0.000001 seconds
false

The reason for this improvement is that dominates2 does create the intermediate array x .<= y so it does not allocate as much and it does not evaluate x <= y all the way and stops when it is false for one i.

4 Likes

You could try storing all data transposed so you can access arr[:,i] instead, as this is how it is stored in memory.

Also, this line
sum([ dominates(arr[i,:], arr[j,:]) for i in 1:s, j in 1:s ], dims=1)
creates a matrix just to sum over it. Writing out this loop would probably save you both a lot of allocations and time.

3 Likes

Thank you, both @blegat and @baggepinnen – these suggestions helped a lot. I now have

dominates(x, y) = all(i -> x[i] <= y[i], eachindex(x)) && any(i -> x[i] < y[i], eachindex(x))

function nonDominatedSorting(arr0::Array{Float64,2})
    arr = arr0'
    fronts::Array{Array,1} = Array[]
    ind::Array{Int64,1} = collect(1:size(arr, 2))
    while !isempty(arr)
        s = size(arr, 2)
        red = Array{Bool,2}(undef, s, s)
        for i in 1:s, j in 1:s
            red[i, j] = dominates(arr[:, i], arr[:, j])
        end
        sel::Array{Int64,1} = collect(1:s)[ dropdims(sum(red, dims=1) .== 0, dims=1) ]
        push!(fronts, ind[sel])
        da::Array{Int64,1} = deleteat!(collect(1:s), sel)
        ind = deleteat!(ind, sel)
        arr = arr[:,da]
    end
    return fronts
end

which is a lot faster than the R function I used:

scores = rand(Float64, (50, 3))

@btime nonDominatedSorting($scores)
# 394.493 μs (8801 allocations: 987.38 KiB)

This should give you another order of magnitude or so:

function dominates2(x, y)
    strict_inequality_found = false
    for i in eachindex(x)
        y[i] < x[i] && return false
        strict_inequality_found |= x[i] < y[i]
    end
    return strict_inequality_found
end

function nds2(arr)
    fronts = Vector{Int64}[]
    ind = collect(1:size(arr, 1))
    a = [arr[i, :] for i = 1:size(arr, 1)]
    while !isempty(a)
        s = length(a)
        red = [all(x -> !dominates2(x, y), a) for y in a]
        push!(fronts, ind[red])
        da = deleteat!(collect(1:s), red)
        ind = deleteat!(ind, red)
        a = a[da]
    end
    return fronts
end

Edit: You may want to check correctness, especially in the presence of equal values. Having a testset of tricky cases is good practice anyway.

4 Likes

Many thanks @GunnarFarneback – this is correct, as far as I have been able to check, and is indeed much faster:

@btime nds2($scores)
# 37.822 μs (266 allocations: 14.38 KiB)

Note that trsnaposing is lazy, this does not actually change the memory layout, so your arr[:,i] is still ineffective. If you do arr = copy(arr0') the data is copied and put into the correct memory layout.

2 Likes

Thanks, I didn’t know. This made the function about twice as fast, but it’s still slower than @GunnarFarneback’s nds2, so for now, I’ll go with that.

1 Like

I’m not sure what this thing is doing, but you can get more speedup (>2x over the one from @GunnarFarneback) by doing this:

function nds3(arr)
   fronts = Vector{Int64}[]
   ind = collect(axes(arr, 1))
   a = collect(eachrow(arr))
   while !isempty(a)
       red = [all(x -> !dominates2(x, y), a) for y in a]
       push!(fronts, ind[red])
       deleteat!(ind, red)
       deleteat!(a, red)
   end
   return fronts
end
5 Likes

That’s fantastic, many thanks!

What a nice community this is: within a couple of hours, the function got about 500 times faster thanks to the joint help of people on here!

8 Likes

BTW, if the scores matrix has a fixed, known and small size in the second dimension, you could restructure it as a vector of SVectors, to possibly get even further speedups.

3 Likes

Thanks, the only way I can get this to work is by using

scores = @SArray rand(50, 3)

but then the timing actually gets worse.

No I was thinking more along the lines of a Vector of SVectors, not a large SMatrix. StaticArrays are not particularly performant when they become large (>100 elements, roughly). Also, you may have to change the code, I think.

1 Like

Maybe something similar to this:

function nds4(arr)
   fronts = Vector{Int64}[]
   ind = collect(axes(arr, 1))
   a = SVector{3}.(eachrow(arr))
   while !isempty(a)
       red = [all(x -> !dominates2(x, y), a) for y in a]
       push!(fronts, ind[red])
       deleteat!(ind, red)
       deleteat!(a, red)
   end
   return fronts
end
4 Likes

It seems we just broke 1000-fold speedup limit :smiley:

6 Likes

Amazing! This gives

@btime nds4(scores)
# 5.510 μs (100 allocations: 7.30 KiB)

Many thanks to all!

1 Like

Julia. Come for the speed. Stay for the community.

7 Likes

Is there a sorting package or some pace to post this version to…?

2 Likes

It looks like I’m arriving late here, but here are a few more thoughts:

I tend to think that dynamic operations like deleteat! and push! are costly in general. One might want to avoid them by performing the appropriate swaps in order to move the elements to the beginning of the array. The algorithm can then continue working on an arr array of unchanged length, but looking only at the end of it. It is also possible to examine the dominance on elements one by one, instead of computing the full red array storing whether each element is dominated.

Here is an implementation trying to take advantage of these observations.

function nds5(arr)
    fronts = Vector{Int64}[]
    ind = collect(axes(arr, 1))
    a = SVector{3}.(eachrow(arr))
    N = length(a)

    # All elements in arr[1:ii-1] are dominated
    ii = 1
    while ii<=N
        start = ii
        @inbounds for i in ii:N
            for j in start:N
                dominates2(a[j], a[i]) && @goto l1
            end

            # Swap values at indices `ii` and `i` in `ind` and `a`
            ind[ii], ind[i] = ind[i], ind[ii]
            a[ii],   a[i]   = a[i],   a[ii]
            ii += 1

            @label l1
        end
        push!(fronts, ind[start:ii-1])
    end
    return fronts
end

Note that it is not strictly equivalent to the original functions, since the sorting is not stable : the same indices are output in a different order (whereas the original function would always have indices sorted in ascending order).

julia> nds4(scores)
9-element Array{Array{Int64,1},1}:
 [6, 42, 62, 76, 79, 91]                                                          
 [1, 14, 20, 23, 30, 37, 40, 50, 69, 75, 88]                                      
 [4, 10, 12, 18, 19, 22, 36, 48, 52, 61, 63, 64, 68, 83, 84, 87, 97, 98]          
 [2, 3, 11, 13, 21, 25, 26, 29, 38, 41  …  56, 57, 59, 60, 66, 71, 77, 80, 90, 96]
 [15, 16, 17, 35, 39, 43, 44, 45, 67, 70, 81, 82, 89, 94, 100]                    
 [5, 9, 24, 31, 32, 46, 49, 58, 65, 78, 85, 86]                                   
 [8, 27, 33, 34, 47, 53, 72, 73, 92]                                              
 [28, 74, 93, 95]                                                                 
 [7, 99]                                                                          
julia> nds5(scores)
9-element Array{Array{Int64,1},1}:
 [6, 42, 62, 76, 79, 91]                                                          
 [14, 20, 23, 30, 37, 40, 50, 69, 75, 88, 1]                                      
 [18, 19, 22, 10, 36, 12, 48, 52, 61, 63, 64, 68, 4, 83, 84, 87, 97, 98]          
 [11, 38, 41, 2, 13, 51, 25, 54, 55, 56  …  26, 3, 66, 29, 71, 21, 77, 80, 90, 96]
 [67, 39, 70, 15, 43, 44, 45, 81, 82, 16, 89, 17, 94, 35, 100]                    
 [9, 78, 5, 65, 31, 32, 85, 86, 24, 46, 49, 58]                                   
 [8, 33, 47, 27, 92, 53, 34, 72, 73]                                              
 [95, 28, 74, 93]                                                                 
 [99, 7]                                                                          

In the implementation above, it is interesting to observe that the ind array is shuffled in such a way that it contains all indices of the output partition in the correct order. All that is needed to reconstruct the output partition of 1:n is the list of part boundaries to correctly split the ind array.

The following implementation relies on a custom Partition type, which allows manipulating partitions of 1:n in the form of a 1D array of indices (which is a reordering of 1:n) and an array of boundaries. This avoids manipulating arrays of arrays:

struct Partition
    ind    :: Vector{Int64}
    bounds :: Vector{Int64}
end

Partition(n::Int64) = Partition(collect(1:n), [1])

function push_bound!(p::Partition, i::Int64)
    push!(p.bounds, i)
    p
end

Base.getindex(p::Partition, i::Int64) = if i==lastindex(p.bounds)
    view(p.ind, p.bounds[i]:lastindex(p.ind))
else
    view(p.ind, p.bounds[i]:p.bounds[i+1]-1)
end

function Base.show(io::IO, p::Partition)
    n = length(p.bounds)
    if p.bounds[n] == length(p.ind)+1
        n -= 1
    end
    println("$n-part Partition:")
    for i in 1:n
        println(" ", p[i])
    end
end

function nds6(arr)
    a = SVector{3}.(eachrow(arr))
    N = length(a)
    p = Partition(N)

    ii = 1
    while ii<=N
        start = ii
        @inbounds for i in ii:N
            for j in start:N
                dominates2(a[j], a[i]) && @goto l1
            end

            # Swap values at indices `ii` and `i` in `ind` and `a`
            p.ind[ii], p.ind[i] = p.ind[i], p.ind[ii]
            a[ii],     a[i]     = a[i],     a[ii]
            ii += 1

            @label l1
        end
        push_bound!(p, ii)
    end
    return p
end

which yields the same results as nds5, except stored in another fashion:

julia> nds6(scores)
9-part Partition:
 [6, 42, 62, 76, 79, 91]
 [14, 20, 23, 30, 37, 40, 50, 69, 75, 88, 1]
 [18, 19, 22, 10, 36, 12, 48, 52, 61, 63, 64, 68, 4, 83, 84, 87, 97, 98]
 [11, 38, 41, 2, 13, 51, 25, 54, 55, 56, 57, 59, 60, 26, 3, 66, 29, 71, 21, 77, 80, 90, 96]
 [67, 39, 70, 15, 43, 44, 45, 81, 82, 16, 89, 17, 94, 35, 100]
 [9, 78, 5, 65, 31, 32, 85, 86, 24, 46, 49, 58]
 [8, 33, 47, 27, 92, 53, 34, 72, 73]
 [95, 28, 74, 93]
 [99, 7]

Now some benchmarks:

using BenchmarkTools

function bench(N)
    scores = rand(Float64, (N, 3));
    @assert nds1(scores) == nds2(scores) == nds3(scores) == nds4(scores)
    print("nds1");  @btime nds1($scores)
    print("nds2");  @btime nds2($scores)
    print("nds3");  @btime nds3($scores)
    print("nds4");  @btime nds4($scores)
    print("nds5");  @btime nds5($scores)
    print("nds6");  @btime nds6($scores)

    nothing
end
julia> bench(50)
nds1  12.494 ms (40026 allocations: 27.14 MiB)
nds2  42.062 μs (265 allocations: 14.36 KiB)
nds3  13.697 μs (240 allocations: 7.97 KiB)
nds4  5.633 μs (94 allocations: 7.00 KiB)
nds5  4.675 μs (64 allocations: 5.77 KiB)
nds6  3.211 μs (59 allocations: 4.94 KiB)

julia> bench(50)
nds1  14.268 ms (46021 allocations: 31.52 MiB)
nds2  48.169 μs (295 allocations: 15.48 KiB)
nds3  17.536 μs (266 allocations: 8.58 KiB)
nds4  6.547 μs (100 allocations: 7.30 KiB)
nds5  5.779 μs (65 allocations: 5.86 KiB)
nds6  3.303 μs (59 allocations: 4.94 KiB)

Most of the remaining allocations in nds6 come from the Array-to-SArray conversion, which takes negligible time and is therefore probably not worth optimizing:

julia> f(x) = SVector{3}.(eachrow(x))
f (generic function with 1 method)

julia> @btime f($scores)
  693.521 ns (54 allocations: 4.20 KiB)


Full code (for anyone wanting to reproduce the results)
dominates(x, y) = all(x .<= y) && any(x .< y)

function dominates2(x, y)
    strict_inequality_found = false
    for i in eachindex(x)
        y[i] < x[i] && return false
        strict_inequality_found |= x[i] < y[i]
    end
    return strict_inequality_found
end

function nds1(arr::Array{Float64,2})
    fronts::Array{Array,1} = Array[]
    ind::Array{Int64,1} = collect(1:size(arr, 1))
    while !isempty(arr)
        s = size(arr, 1)
        red = dropdims(sum([ dominates(arr[i,:], arr[j,:]) for i in 1:s, j in 1:s ], dims=1) .== 0, dims=1)
        a = 1:s
        sel::Array{Int64,1} = a[red]
        push!(fronts, ind[sel])
        da::Array{Int64,1} = deleteat!(collect(1:s), sel)
        ind = deleteat!(ind, sel)
        arr = arr[da, :]
    end
    return fronts
end

function nds2(arr)
    fronts = Vector{Int64}[]
    ind = collect(1:size(arr, 1))
    a = [arr[i, :] for i = 1:size(arr, 1)]
    while !isempty(a)
        s = length(a)
        red = [all(x -> !dominates2(x, y), a) for y in a]
        push!(fronts, ind[red])
        da = deleteat!(collect(1:s), red)
        ind = deleteat!(ind, red)
        a = a[da]
    end
    return fronts
end

function nds3(arr)
   fronts = Vector{Int64}[]
   ind = collect(axes(arr, 1))
   a = collect(eachrow(arr))
   while !isempty(a)
       red = [all(x -> !dominates2(x, y), a) for y in a]
       push!(fronts, ind[red])
       deleteat!(ind, red)
       deleteat!(a, red)
   end
   return fronts
end

using StaticArrays

function nds4(arr)
   fronts = Vector{Int64}[]
   ind = collect(axes(arr, 1))
   a = SVector{3}.(eachrow(arr))
   while !isempty(a)
       red = [all(x -> !dominates2(x, y), a) for y in a]
       push!(fronts, ind[red])
       deleteat!(ind, red)
       deleteat!(a, red)
   end
   return fronts
end

function nds5(arr)
    fronts = Vector{Int64}[]
    ind = collect(axes(arr, 1))
    a = SVector{3}.(eachrow(arr))
    N = length(a)

    ii = 1
    while ii<=N
        start = ii
        @inbounds for i in ii:N
            for j in start:N
                dominates2(a[j], a[i]) && @goto l1
            end

            # Swap values at indices `ii` and `i` in `ind` and `a`
            ind[ii], ind[i] = ind[i], ind[ii]
            a[ii],   a[i]   = a[i],   a[ii]
            ii += 1

            @label l1
        end
        push!(fronts, ind[start:ii-1])
    end
    return fronts
end


struct Partition
    ind    :: Vector{Int64}
    bounds :: Vector{Int64}
end

Partition(n::Int64) = Partition(collect(1:n), [1])

function push_bound!(p::Partition, i::Int64)
    push!(p.bounds, i)
    p
end

Base.getindex(p::Partition, i::Int64) = if i==lastindex(p.bounds)
    view(p.ind, p.bounds[i]:lastindex(p.ind))
else
    view(p.ind, p.bounds[i]:p.bounds[i+1]-1)
end

function Base.show(io::IO, p::Partition)
    n = length(p.bounds)
    if p.bounds[n] == length(p.ind)+1
        n -= 1
    end
    println("$n-part Partition:")
    for i in 1:n
        println(" ", p[i])
    end
end

function nds6(arr)
    a = SVector{3}.(eachrow(arr))
    N = length(a)
    p = Partition(N)

    ii = 1
    while ii<=N
        start = ii
        @inbounds for i in ii:N
            for j in start:N
                dominates2(a[j], a[i]) && @goto l1
            end

            # Swap values at indices `ii` and `i` in `ind` and `a`
            p.ind[ii], p.ind[i] = p.ind[i], p.ind[ii]
            a[ii],     a[i]     = a[i],     a[ii]
            ii += 1

            @label l1
        end
        push_bound!(p, ii)
    end
    return p
end

using BenchmarkTools

function bench(N)
    scores = rand(Float64, (N, 3));
    @assert nds1(scores) == nds2(scores) == nds3(scores) == nds4(scores)
    print("nds1");  @btime nds1($scores)
    print("nds2");  @btime nds2($scores)
    print("nds3");  @btime nds3($scores)
    print("nds4");  @btime nds4($scores)
    print("nds5");  @btime nds5($scores)
    print("nds6");  @btime nds6($scores)

    nothing
end
6 Likes

The ordering of the indices is unimportant: the function calculates the so-called Pareto fronts, and within any front all solutions are equivalent. In other words, nds5 would have done. I can’t now quite follow why, despite the extra work, nds6 is still faster. In any case, fantastic to see that a further speedup was still possible.

In response to @yakir12, the non-dominated sorting algorithm is a key component in Deb’s NSGA-II algorithm, which is one of the most popular algorithms for agent-based optimization. At the moment, there are three Julia packages for this, but two seem to have been abandoned and one is, at least for my purposes, not flexible enough. For me, it’s much more useful to have the components of the NSGA-II algorithm available (another key component is the crowding sorting distance) and then to combine them in whichever way best suits my needs for the case at hand. I could imagine a package with fast versions of the necessary functions (like the one developed in this thread for non dominated sorting) to be useful for some people.

2 Likes