I have several vectors (with different types of elements and lengths) whose entries I wish to sort in-place according to some function, like first or identity (lexicographical sort). For example:
n=10^7; x=rand(1:n,n); y=rand(1.0:n,n);
I’d like to make it as efficient as possible, but still use the built-in sort!. A good solution is:
If x and y have different lengths, I would pass view(x,...).
Given that I will only be using StructArray on Vectors and only for sorting, I’d like to extract the minimal amount of code from that repo, to avoid another dependency. I was hoping someone with experience there could help me out.
Julia already has zip that iterates to the shortest length: for (i,j) ∈ zip([1,2,3,4,5],[1.0,2.0,3.0]) print("$i $j ") end
But sort!(zip(x,y)) does not work. Inspired by this, I’d like to create Zip as a lightweight version of StructArray. My attempt:
struct Zip{T<:Tuple} <: AbstractVector{Tuple}
vec::T
end;
Base.length(z::Zip) = minimum(length(v) for v ∈ z.vec) # truncate to shortest vector
Base.size(z::Zip) = (length(z),) # for AbstractVector compliance
Base.getindex(z::Zip, i::Int) = map(v -> v[i], z.vec) # Get tuple (x[i], y[i], z[i], ...)
Base.setindex!(z::Zip, vals::Tuple, i::Int) = foreach((x, y) -> x[i] = y, z.vec, vals) # assign in-place
@time sort!(Zip((x,y)), by=first, alg=QuickSort);
8.336718 seconds (265.34 M allocations: 5.931 GiB, 4.23% gc time)
Any idea why my approach is so much slower and causes more allocations and memory usage?
Umm, I still want my sorting to be in-place, so I will avoid sortperm, because doing p=sortperm(x); x=x[p]; y=y[p]; creates 3 new allocations of large vectors (in your above benchmark, sort uses twice as much memory as sort!, and you haven’t even done x=x[p]; y=y[p];).
if you use view, it doesn’t create large allocations. you just need 1 allocation for the order, after that it’s allocation-free, and you can re-use the order vector.
And I just want to reiterate that, unless you’re GC-bound, the size of allocation isn’t so much of an issue, many many small-ish allocation is more of a performance killer usually.
@eldee Hmm. But my use-case involves vectors of different types. How could I remedy this? I tried with parametrization
struct Zip2{tx,ty} <: AbstractVector{Tuple}
x::Vector{tx}
y::Vector{ty}
end;
function Base.length(a::Zip2{tx,ty}) ::Int where {tx,ty}
return min(length(a.x),length(a.y)) end # truncate to shortest vector
function Base.size(a::Zip2{tx,ty}) ::Tuple{Int64} where {tx,ty}
return (length(a),) end # for AbstractVector compliance
function Base.getindex(a::Zip2{tx,ty}, i::Int) ::Tuple{tx,ty} where {tx,ty}
return (a.x[i],a.y[i]) end # get tuple
function Base.setindex!(a::Zip2{tx,ty}, t::Tuple{tx,ty}, i::Int) ::Nothing where {tx,ty}
a.x[i] = t[1]; a.y[i] = t[2]; return nothing end # assign in-place
I think @eldee solution works well because Julia can handle homogeneous tuples Tuple{Float64, Float64} more efficiently than heterogeneous tuples Tuple{Int, Float64} ,So avoid subset Tuples and i think using @jlingsortperm! with be good choice.
@mohamed.d180 What you suggest is not in-place (creating data vector). I already have vectors x and y (many of them). I do not want to create any new arrays. I wish to sort x and y in-place as efficiently as possible.
Let me just point out that inplace operations need not be more efficient than working an a newly allocated vector. Allocations should really only be avoided in hot loops, e.g. within every sort step (cf. original Zip version).
There might of course be plenty of other reasons you might want to work inplace (e.g. to reduce memory usage). In this case sort! does also seem to be slightly faster than sort:
julia> @btime sort!(x) setup=(x=rand(10^7));
202.232 ms (6 allocations: 76.30 MiB)
julia> @btime sort(x) setup=(x=rand(10^7));
215.259 ms (9 allocations: 152.60 MiB)
So I’d argue avoiding sort is typically not really necessary. There are good reasons to avoid sortperm though: it’s very slow.
julia> @btime sortperm(x) setup=(x=rand(10^7))
4.565 s (6 allocations: 152.59 MiB)
What’s happening here is that @btime repeats the code block sort!(Zip2(x, y)) multiple times and reports the lowest execution time. But after the first run x and y are already sorted, so subsequent runs will be much faster. In this case you should use setup=..., which will be executed before every run, to set x and y.
Also, the default alg seems to perform really poorly here. (The default by/lt is lexicographical, so essentially the same as by=first.)
(it’s clearer in the REPL where you get colour coding). The issue is that the compiler does not know the type of vals[j] (it’s either Int64 or Float64, but no idea which one). The same applies for z.vec[j][i]. This is also the reason that the (Float64, Float64) example does work fine.
Simply unrolling the loop gives the compiler enough information: since vals isa Tuple{Int, Float64} and z.vec isa Tuple{Vector{Int64}, Vector{Float64}}, it knows that e.g. vals[1] isa Int, as is z.vec[1][i].
function Base.setindex!(z::Zip, vals::Tuple, i::Int)
z.vec[1][i] = vals[1]
z.vec[2][i] = vals[2]
end
@btime sort!(Zip((x, y)), by=first, alg=QuickSort) setup=(x=rand(1:$n,$n);y=rand(1.0:$n,$n))
# 799.739 ms (0 allocations: 0 bytes)
# 10000000-element Zip{Tuple{Vector{Int64}, Vector{Float64}}}:
The downside is that to unroll the loop you have to know length(z.vec) or length(vals) (which feels like it really should be inferable from Tuple{Int, Float64}, but apparently isn’t) in advance. Zip2 solves the issue essentially in this way, by restricting to precisely two vectors x and y.
To be more generic you could add length information in the Zip type.
struct ZipN{N, T<:Tuple} <: AbstractVector{Tuple}
vec::T
ZipN{N, T}(vecs::T) where {N, T} = length(vecs) == N ? new(vecs) : error("Lengths don't match!")
end
ZipN(vecs) = ZipN{length(vecs), typeof(vecs)}(vecs)
Base.length(z::ZipN{N}) where N = minimum(length(z.vec[i]) for i = 1:N) # truncate to shortest vector
Base.size(z::ZipN) = (length(z),) # for AbstractVector compliance
Base.getindex(z::ZipN, i::Int) = map(v -> v[i], z.vec) # Get tuple (x[i], y[i], z[i], ...)
# (Could probably exploit knowledge of N here, as we did for length)
@generated function Base.setindex!(z::ZipN{N}, vals::Tuple, i::Int) where N
ex = :()
for j = 1:N
ex = :($ex; z.vec[$j][i] = vals[$j])
end
return ex
end
This should then be essentially as fast as Zip2, while allowing for arbitrary many elements in the Tuples:
Your answer is very helpful and insightful, much appreciated! I will look into @generated and might have some follow-up questions, if the Julia documentation and ChatGPT do not suffice.
That’s a very good question, especially since we had established that sortperm is way slower and sort!(::StructArray) calls it on the first component. Also note that we somehow don’t get any allocations.
Well, turns out sort!(::StructArray) is never used here, which you can check via @which (and @edit or @less to inspect the actual code). When specifying alg we’re falling back on the generic version in sort.jl:
The only major difference between ZipN and StructArray I then see at the moment is that StructArray (safely) uses @inbounds. If we (unsafely) add @inbounds for ZipN
Base.length(z::ZipN{N}) where N = minimum(length(@inbounds(z.vec[i])) for i = 1:N) # (should be safe)
Base.getindex(z::ZipN{N}, i::Int) where N = map(v -> @inbounds(v[i]), z.vec) # unsafe
@generated function Base.setindex!(z::ZipN{N}, vals::Tuple, i::Int) where N
ex = Expr(:meta, :inline)
for j = 1:N
ex = :($ex; @inbounds(z.vec[$j][i] = vals[$j]))
end
return ex
end # unsafe
the performance is pretty much the same
julia> @btime sort!(ZipN((xc,yc)), alg=QuickSort) setup=(xc=copy($x); yc=copy($y)); # copy instead of rand for fair comparison
63.996 ms (0 allocations: 0 bytes)
julia> @btime sort!(StructArray((xc,yc)), alg=QuickSort) setup=(xc=copy($x); yc=copy($y));
62.854 ms (0 allocations: 0 bytes)
I don’t know how to use this output. : ( There are no occurences of Any in this output. Similarly for @code_typed and @code_llvm. ChatGPT and DeepSeek suggest the problem is with view, but if StructArray can avoid allocations, so should ZipN.
I’m really not sure why length(::ZipN) behaves differently for z and zv. StructArrays avoids problems by just using the length of the first component. If we want to use the minimum of the component lengths, we can again use a generated function to get around the issue.
Base.length(z::ZipN{0}) = 0
@generated function Base.length(z::ZipN{N}) where N
min_args = (:(length(z.vec[$j])) for j = 1:N)
return :(min($(min_args...)))
end