Dispatching by values is very slow - Feature or Problem?

your argument is “but it’s pointless because it doesn’t sort Vector!”, and the counterargument is exactly: because there’s niche!

I am not sure where you see a disagreement here. Do we agree that not using dispatching by value makes it much faster in this case? Therefore either we do not allow Base Vector if you want to dispatch by value because of the performance hit, or we do allow them to gain performance and you do not dispatch by value.

I don’t see the sense of allowing something if you know it is slower than Base. That’s my opinion though.

Let me nonetheless thank you @jling again for your comments they were very helpful.

1 Like

I think you’re talking past each other.

Yes, sorting small vectors is something some fields require. That doesn’t mean that the default implementation for Vectors should be able to handle that case perfectly (I happen to think it should too, but there are other problems that are more time consuming than sorting such short vectors, e.g. creating that vector in the first place). That’s why the usual recommendation for such small vectors is “use StaticArrays”, because the core language can’t iterate on implementation details as fast as a package can. I’ve opened a PR to allow SortingNetworks.jl to work with StaticArrays.jl and I’ve checked the compiled code - just passing in an SVector (e.g. swapsort(rand(SVector{4,Float64}))) inlines all branches, gets rid of all length checking etc. and is just as fast as using a Tuple. I can’t compare with your sortn! because I don’t have the fastest memory, i.e. I’m IO bound anyway and the fastest timing I can achieve is 1.3µs (which is the same for all variants of that size! sort!, sortn!, swapsort etc).

I’d encourage you to check out that PR and see with your machine whether swapsort on an SVector is faster than your branching version or not.

4 Likes

Here is a benchmark that supports the claim that sorting networks are faster for tuples:

n = 4
@benchmark sort!(x)    setup=(x=rand(n)) evals=1 samples=100000 #median = 96 ns
@benchmark sortn!(x)   setup=(x=rand(n)) evals=1 samples=100000 #median = 97 ns
@benchmark sort(x)     setup=(x=rand(n)) evals=1 samples=100000 #median = 145 ns
@benchmark swapsort(x) setup=(x=rand(n)) evals=1 samples=100000 #median = 212 ns
@benchmark swapsort(x) setup=(x=Tuple(rand(n))) evals=1 samples=100000 #median = 69 ns

full code:

using SortingNetworks, BenchmarkTools, Statistics

function sortn!(x::Vector)
    n = length(x)
    n == 2 && return sort2!(x)
    n == 3 && return sort3!(x)
    n == 4 && return sort4!(x)
end

function sort4!(x)
    swap(i,j) = x[j], x[i] = x[i], x[j]
    x[2]>x[3] && swap(2,3)
    x[1]>x[2] && swap(1,2); x[3]>x[4] && swap(3,4)
    x[2]>x[3] && swap(2,3)
    x[1]>x[2] && swap(1,2); x[3]>x[4] && swap(3,4)
    true
end

@benchmark sort!(x)    setup=(x=rand(n)) evals=1 samples=100000 #median = 96 ns
@benchmark sortn!(x)   setup=(x=rand(n)) evals=1 samples=100000 #median = 97 ns
@benchmark sort(x)     setup=(x=rand(n)) evals=1 samples=100000 #median = 145 ns
@benchmark swapsort(x) setup=(x=rand(n)) evals=1 samples=100000 #median = 212 ns
@benchmark swapsort(x) setup=(x=Tuple(rand(n))) evals=1 samples=100000 #median = 69 ns
1 Like

Thank you for those benchmarks! Seems like I really do have to upgrade my machine :sweat_smile:

Now if only the PR to SortingNetworks.jl for allowing any AbstractVector to be passed in would be merged :slight_smile: Until then, would you mind cloning the branch in question from my fork (found here), deving it into an environment and checking the speed when putting in StaticArrays? I’d expect the speed to be the same as for tuples, since that’s what StaticArrays is using internally.

Perhaps the compiler was already able to figure this out?

#add/free SortingNetworks
@benchmark sort!(x)    setup=(x=rand(n)) evals=1 samples=100000 #median = 96 ns
@benchmark sortn!(x)   setup=(x=rand(n)) evals=1 samples=100000 #median = 83 ns
@benchmark sort(x)     setup=(x=rand(n)) evals=1 samples=100000 #median = 140 ns
@benchmark swapsort(x) setup=(x=rand(n)) evals=1 samples=100000 #median = 212 ns
@benchmark swapsort(x) setup=(x=Tuple(rand(n))) evals=1 samples=100000 #median = 88 ns
@benchmark swapsort(x) setup=(x=StaticArray(rand(n))) evals=1 samples=100000) #median = 73 ns

#dev https://github.com/Seelengrab/SortingNetworks.jl.git
@benchmark sort!(x)    setup=(x=rand(n)) evals=1 samples=100000 #median = 96 ns
@benchmark sortn!(x)   setup=(x=rand(n)) evals=1 samples=100000 #median = 87 ns
@benchmark sort(x)     setup=(x=rand(n)) evals=1 samples=100000 #median = 143 ns
@benchmark swapsort(x) setup=(x=rand(n)) evals=1 samples=100000 #median = 206 ns
@benchmark swapsort(x) setup=(x=Tuple(rand(n))) evals=1 samples=100000 #median = 69 ns
@benchmark swapsort(x) setup=(x=StaticArray(rand(n))) evals=1 samples=100000) #median = 68 ns

No, there’s just a fallback method:

julia> using StaticArrays, SortingNetworks                                           
                                                                                     
julia> swapsort(rand(SVector{4,Float64}))                                            
([0.30452790929831275, 0.17044971202350268, 0.5731354773789699, 0.7998330355999902],)
                                                                                     
julia> swapsort(rand(SVector{4,Float64})) |> typeof                                  
Tuple{SVector{4, Float64}}                                                           

The returned object should be a tuple of numbers, but this is a tuple of a single array (which isn’t sorted at all). Compare with the output of a tuple:

julia> swapsort(Tuple(rand(4)))                                                  
(0.31303063934598596, 0.8332144225773782, 0.8612756394938614, 0.9229322452749227)
                                                                                 
julia> swapsort(Tuple(rand(4))) |> typeof                                        
NTuple{4, Float64}                                                               

The reason it couldn’t possibly have figured it out in the original code was because the methods in question were taking a Vector{T} argument, limiting the possible vectors to the true Vector type from Base. In the PR, I changed those signatures to take an AbstractVector instead, allowing anything that looks like a vector (like a SVector from StaticArrays) to be used.


When using the PR:

(jl_sJq4si) pkg> st                                                                              
      Status `/tmp/jl_sJq4si/Project.toml`                                                       
  [07e3d4f1] SortingNetworks v0.3.1 `../../d/Documents/Projects/juliaProjects/SortingNetworks.jl`
  [90137ffa] StaticArrays v1.2.9                                                                 
                                                                                                 
julia> using StaticArrays, SortingNetworks                                                       
                                                                                                 
julia> swapsort(rand(SVector{4,Float64}))                                                        
(0.14290259334067623, 0.6245153918717478, 0.6504626079016613, 0.8373470603702046)                
                                                                                                 
julia> swapsort(rand(SVector{4,Float64})) |> typeof                                              
NTuple{4, Float64}                                                                               
1 Like

Wow! You’re quite right. I had assumed the result would end up sorted. In ordinary code I imagine I’d get a type error from the output.

Well, if it weren’t for that fallback method, you’d have gotten a MethodError :slight_smile: I’m not sure why this fallback exists in the first place :man_shrugging:

To support splatting: swapsort(6, 4) == (4, 6).

Well, it’s not technically necessary for splatting to work. I also wouldn’t expect splatting of a single element into swapsort to work, there’s nothing to swap after all. It’s also not like the single argument function is some sort of recursion base case.

In any case, it’s been cleared up now :slight_smile: