Why does sort! allocate inside loops when the input is large?

HMegh · October 18, 2025, 3:20am

Hi,

I have noticed that sort! allocates when called repetitively, but I am not sure why since I did not expect it to allocate:

v=rand(1000);
@btime sort!(v)
#  1.744 μs (0 allocations: 0 bytes)

In my code, I need to analyze columns of a large matrix of data. Typically, it is better to sort each column first before doing anything (like calculating quantiles, medians,…). Here’s a MWE

function quant(U)
    d,n=size(U)
    v=zeros(d)
    quants=zeros(n)
    for j=1:n
        copy!(v,view(U,:,j))
        sort!(v)
        quants[j]=quantile!(v,.2;sorted=true)
    end
    return quants
end

However, this function allocates and it is seems that the culprit is the sort!(v) line (without it, there are only 5-6 allocations).

U=rand(1000,1000)
@btime quant($U)
# 20.244 ms (6006 allocations: 9.82 MiB)

Curiously enough, when size(U,1) is small, there are no allocations

U=rand(10,1000);
@btime quant($U)
#  160.864 μs (5 allocations: 8.02 KiB)

Any ideas on why this is happening?

Oscar_Smith · October 18, 2025, 3:40am

TLDR is that Julia uses an algorithm called Scratch quick sort that uses an allocation in order to be both faster than regular quicksort and stable. See https://www.youtube.com/watch?v=RIhCBTx5TYA for details (this also might be a case where we use Radix sort which also allocates). We have an API for passing in a scratch buffer to sort but I’m not sure if that is part of the public interface. @Lilith can you confirm?

Oscar_Smith · October 18, 2025, 3:48am

Specifically, defining:

function quant(U)
    d,n=size(U)
    v=zeros(d)
    quants=zeros(n)
    scratch = similar(quants)
    for j=1:n
        copy!(v,view(U,:,j))
        sort!(v; scratch)
        quants[j]=quantile!(v,.2;sorted=true)
    end
    return quants
end

deletes half of the allocations and is 25% faster. (the other half come from a line in Radix sort that has the comment # TODO use scratch for this

HMegh · October 19, 2025, 1:10am

Thanks for sharing. I notice that Julia is not using ScratchQuickSort by default (at least for the example U=rand(1000,1000)), but when when I specify the algorithm, the allocations disappear. I will include some benchmarks below in case someone stumbles upon this thread in the future.

function quant(U,alg)
    d,n=size(U)
    v=zeros(d)
    quants=zeros(n)
    scratch=zeros(d)
    for j=1:n
        copy!(v,view(U,:,j))
        sort!(v;alg=alg,scratch=scratch)

        quants[j]=quantile!(v,.2;sorted=true)
    end
    return quants
end
U=rand(1000,1000);
@btime quant($U,$(Base.Sort.ScratchQuickSort()));
#  42.819 ms (9 allocations: 23.65 KiB)

This is faster than the regular QuickSort

@btime quant($U,$(QuickSort))
# 55.677 ms (9 allocations: 23.65 KiB)

But it is slower than the DefaultStable algorithm (which alllocates)

@btime quant($U,$(Base.Sort.DefaultStable()));
# 16.796 ms (3009 allocations: 2.14 MiB)

As far as I can tell, there is a trade-off between allocations and speed,

Lilith · October 21, 2025, 5:17pm

Yep, this is all correct. And no scratch is not part of the public interface, largely because the appropriate eltype for the scratch array is non-obvious.

DNF · October 21, 2025, 9:31pm

In this particular case it is better to not sort first, and just leave it to the quantile! function (which can make do with a partial sort.) This might be different in a more realistic scenario, perhaps.

DNF · October 21, 2025, 9:46pm

I couldn’t find any sorting functions that do sort!(output, input). That would be a good solution in many cases, I think, and would allow you to do

sort!(output, view(U, :, j))

or similar with partialsort!

Topic		Replies	Views
Why is sorting matrices allocating and why is it sometimes slower than the naive approach? Performance sort , matrices	11	571	November 6, 2021
Understanding allocation pattern when using sort!() General Usage	12	411	June 15, 2022
Allocation when sorting a view Performance question	16	650	October 13, 2022
Fast permutation vector: code needed Performance sorting	38	833	March 21, 2023
Why does sortperm! allocate here? Performance sortperm , memory-allocation	4	313	March 24, 2024

Why does sort! allocate inside loops when the input is large?

Related topics