x = rand(Float32, 10000000);
@time quantile!(x, [0.02, 0.98])
0.196571 seconds (2 allocations: 160 bytes)
2-element Vector{Float64}:
0.02001022338867188
0.9800319683551788
@time quantile!(x, 0.02)
0.022732 seconds (1 allocation: 16 bytes)
0.02001022338867188
@time quantile!(x, 0.98)
0.022851 seconds (1 allocation: 16 bytes)
0.9800319683551788
quantile!
modifies the input, so your second two cases don’t need to spend any time sorting the input vector.
julia> v = [3, 2, 1]
3-element Vector{Int64}:
3
2
1
julia> quantile!(v, 0.5)
2.0
julia> v
3-element Vector{Int64}:
1
2
3
That said, I can confirm that performance is much worse than it ought to be:
julia> @btime quantile($x, [0.02, 0.98])
2.145 s (4 allocations: 38.15 MiB)
julia> @btime quantile($x, 0.02)
99.073 ms (2 allocations: 38.15 MiB)
To add to this: I am not super familiar with quantile!, but this is not due to the vector possibly being allocated on the heap as opposed to living on the stack as the individual numbers are.
Using a static Array which is stack-allocated does not remove this difference in performance:
julia> using Statistics,StaticArrays,BenchmarkTools
julia> x = rand(Float32,10000000);StaticInput = SA[0.02,0.98];
julia> @btime quantile!($x,$StaticInput)
490.895 ms (0 allocations: 0 bytes)
2-element SVector{2, Float64} with indices SOneTo(2):
0.0199735164642334
0.9800283956527709
julia> @btime quantile!($x,0.02)
42.356 ms (0 allocations: 0 bytes)
0.0199735164642334
julia> @btime quantile!($x,0.98)
28.156 ms (0 allocations: 0 bytes)
0.9800283956527709
Your benchmark is broken - after the first warmup, even an interpolated x
will already be sorted because you reuse that variable all throughout the benchmark. You have to use a new x
for each run, so use setup=(x=rand(Float32,10000000))
and evals=1
as additional arguments to make sure each unsorted x
is only used once.
This doesn’t have anything to do with whether the array is allocated on the heap or not.
Sorry to have brought a staring confusion by using the !inplace form. But also the mutating form showes the same behavior
x = rand(Float32, 10000000);
xx = deepcopy(x);
@time quantile(x, [0.02, 0.98])
0.944558 seconds (4 allocations: 38.147 MiB)
2-element Vector{Float64}:
0.019991873502731322
0.9799974584579467
all(x .== xx)
true
@time quantile(x, 0.02)
0.082169 seconds (3 allocations: 38.147 MiB)
0.019991873502731322
all(x .== xx)
true
@time quantile(x, 0.98)
0.088702 seconds (3 allocations: 38.147 MiB)
0.9799974584579467
but also quantil!
. See
x = rand(Float32, 10000000);
@time quantile!(x, [0.02, 0.98])
0.947875 seconds (2 allocations: 160 bytes)
2-element Vector{Float64}:
0.020006891489028934
0.9800460338592529
@time quantile!(x, 0.02)
0.021764 seconds (1 allocation: 16 bytes)
0.020006891489028934
@time quantile!(x, 0.98)
0.021982 seconds (1 allocation: 16 bytes)
0.9800460338592529
and if we repeat the vector form, it’s much slower again
@time quantile!(x, [0.02, 0.98])
0.194401 seconds (2 allocations: 160 bytes)
2-element Vector{Float64}:
0.020006891489028934
0.9800460338592529
If we broadcast quantile
, the times look similar to the sum of the individual times:
julia> @time quantile.(Ref(x), [0.02, 0.98])
0.194654 seconds (9 allocations: 152.588 MiB)
# or, using BenchmarkTools:
@btime quantile.(Ref($x), [0.02, 0.98])
187.388 ms (6 allocations: 152.59 MiB)
Conclusion. The reason why quantile!(x, [0.02, 0.98])
is much slower than [quantile!(x, 0.02), quantile!(x, 0.98)]
is that the former executes Statistics._quantilesort!(v, sorted, minp, maxp)
with minp << maxp and the latter Statistics._quantilesort!(v, sorted, p, p)
.
See https://github.com/JuliaLang/Statistics.jl/blob/master/src/Statistics.jl#L949-L959
Demonstration. (Jupyter notebook)
using Statistics
X = rand(Float32, 10^7)
x = copy(X); y = quantile!(x, [0.02, 0.98])
x = copy(X); z = [quantile!(x, 0.02), quantile!(x, 0.98)]
@show y == z;
y == z = true
x = copy(X); @time quantile!(x, [0.02, 0.98])
x = copy(X); @time quantile!(x, [0.02, 0.98])
x = copy(X); @time quantile!(x, [0.02, 0.98]);
1.171875 seconds (2 allocations: 192 bytes)
1.175514 seconds (2 allocations: 192 bytes)
1.174704 seconds (2 allocations: 192 bytes)
x = copy(X); @time [quantile!(x, 0.02), quantile!(x, 0.98)]
x = copy(X); @time [quantile!(x, 0.02), quantile!(x, 0.98)]
x = copy(X); @time [quantile!(x, 0.02), quantile!(x, 0.98)];
0.179641 seconds (3 allocations: 128 bytes)
0.196316 seconds (3 allocations: 128 bytes)
0.178824 seconds (3 allocations: 128 bytes)
Yes! The former is much slower than the latter.
@time
’s added version of https://github.com/JuliaLang/Statistics.jl/blob/master/src/Statistics.jl#L949-L959
function myquantile!(v::AbstractVector, p::Union{AbstractArray, Tuple{Vararg{Real}}};
sorted::Bool=false, alpha::Real=1., beta::Real=alpha)
if !isempty(p)
minp, maxp = extrema(p)
print("Statistics._quantilesort!(v, sorted, minp, maxp): ")
@time Statistics._quantilesort!(v, sorted, minp, maxp)
end
print("map(x->Statistics._quantile(v, x, alpha=alpha, beta=beta), p):")
return @time map(x->Statistics._quantile(v, x, alpha=alpha, beta=beta), p)
end
myquantile!(v::AbstractVector, p::Real; sorted::Bool=false, alpha::Real=1., beta::Real=alpha) = begin
print("Statistics._quantilesort!(v, sorted, p, p): ")
@time Statistics._quantilesort!(v, sorted, p, p)
print("Statistics._quantile(v, p, alpha=alpha, beta=beta):")
@time Statistics._quantile(v, p, alpha=alpha, beta=beta)
end
println("myquantile!(x, [0.02, 0.98]):")
x = copy(X); u = myquantile!(x, [0.02, 0.98])
println("\n[myquantile!(x, 0.02), myquantile!(x, 0.98)]:")
x = copy(X); v = [myquantile!(x, 0.02), myquantile!(x, 0.98)]
println(); @show u == v;
Output:
myquantile!(x, [0.02, 0.98]):
Statistics._quantilesort!(v, sorted, minp, maxp): 1.208875 seconds
map(x->Statistics._quantile(v, x, alpha=alpha, beta=beta), p): 0.000001 seconds (1 allocation: 96 bytes)
[myquantile!(x, 0.02), myquantile!(x, 0.98)]:
Statistics._quantilesort!(v, sorted, p, p): 0.055211 seconds
Statistics._quantile(v, p, alpha=alpha, beta=beta): 0.000001 seconds
Statistics._quantilesort!(v, sorted, p, p): 0.098211 seconds
Statistics._quantile(v, p, alpha=alpha, beta=beta): 0.000022 seconds
u == v = true
As above, most of the computation time of the former is the execution time of Statistics._quantilesort!(v, sorted, minp, maxp)
with minp << maxp.
Postscript. (case of [0.499, 0.501])
x = copy(X); @time y = quantile!(x, [0.499, 0.501])
x = copy(X); @time y = quantile!(x, [0.499, 0.501])
x = copy(X); @time y = quantile!(x, [0.499, 0.501]);
0.133984 seconds (2 allocations: 192 bytes)
0.138859 seconds (2 allocations: 192 bytes)
0.161751 seconds (2 allocations: 192 bytes)
x = copy(X); @time z = [quantile!(x, 0.499), quantile!(x, 0.501)]
x = copy(X); @time z = [quantile!(x, 0.499), quantile!(x, 0.501)]
x = copy(X); @time z = [quantile!(x, 0.499), quantile!(x, 0.501)];
0.235203 seconds (3 allocations: 128 bytes)
0.225585 seconds (3 allocations: 128 bytes)
0.200007 seconds (3 allocations: 128 bytes)
In this case, the former is faster than the latter.
EmpiricalCDFs.jl uses a different approach that may be more efficient, depending on your use case
julia> using EmpiricalCDFs
julia> X = rand(Float32, 10^7);
julia> cdf = EmpiricalCDF(X);
julia> sort!(cdf);
julia> const icdf2 = finv(cdf); # get the quantile function
julia> @btime icdf2.([.02, .98])
42.028 ns (2 allocations: 192 bytes)
2-element Vector{Float32}:
0.02000904
0.9799932