Hi,
I was wondering how it is possible to parallelise a quicksort algorithm below (from rosetta code) in Julia?
In c with openmp one would use #pragma omp task
. Example :
But what is the equivalent in Julia?
Thank you
Hi,
I was wondering how it is possible to parallelise a quicksort algorithm below (from rosetta code) in Julia?
In c with openmp one would use #pragma omp task
. Example :
But what is the equivalent in Julia?
Thank you
You probably want multithreading, using something like a Threads.@threads
macro. This is just the plain default type of parallelisation but there are other packages that provide similar functionality.
As a start, using the inbuilt threads library is easy:
Threads.@threads for i in 1:100
# do something in parallel
end
You will find this blog useful â as it shows an example of how to multithread mergesort:
I have tried the following but it makes runtime worst.
function quicksort!(A,i=1,j=length(A))
if j > i
pivot = A[(j+i) >>> 1]
left, right = i, j
while left <= right
while A[left] < pivot
left += 1
end
while A[right] > pivot
right -= 1
end
if left <= right
A[left], A[right] = A[right], A[left]
left += 1
right -= 1
end
end
quicksort!(A,i,right)
quicksort!(A,left,j)
end
return A
end
function parallel_quicksort!(A,i=1,j=length(A))
if j > i
if j-i < 100_000
quicksort!(A, i, j)
return A
end
pivot = A[(j+i) >>> 1]
left, right = i, j
while left <= right
while A[left] < pivot
left += 1
end
while A[right] > pivot
right -= 1
end
if left <= right
A[left], A[right] = A[right], A[left]
left += 1
right -= 1
end
end
t = Threads.@spawn parallel_quicksort!(A,i,right)
parallel_quicksort!(A,left,j)
wait(t)
end
return A
end
You may check GitHub - tkf/ThreadsX.jl: Parallelized Base functions sort implementation by @tkf.
Wow! That is super complicated. I am not there yet in term of understanding the Julia language
Maybe the base Threads are not âlight enoughâ for your recursive implementation. You may stop the recursion for large enough chunks or try lighter threads like GitHub - JuliaSIMD/Polyester.jl: The cheapest threads you can find!
Yes it is complicated but optimal parallel sort is not an easy problem anyway. You have a discussion on where to stop the recursion in the file ThreadsX.jl/quicksort.jl at master ¡ tkf/ThreadsX.jl ¡ GitHub
Base.@kwdef struct ParallelQuickSortAlg{Alg,SmallSize,BaseSize} <: ParallelSortAlgorithm
smallsort::Alg = Base.Sort.DEFAULT_UNSTABLE
smallsize::SmallSize = nothing # lazily determined
basesize::BaseSize = 10_000
end
# `basesize` is tuned using `Float64`. Make it `eltype`-aware?
#
# Note: Small `smallsize` (e.g., 32) is beneficial on random data. However, it
# is slower on sorted data with small number of worker threads.
# TODO: Implement some heuristics for `smallsize`.
I am not looking for something optimal at all, I am more at the stage of figuring out how Julia works and how to do thingsâŚlike translating a c openmp quicksort implementation to Julia. The implementation that you refer to is probably well optimised and very advanced. I am more looking for something simple like done in C.
Even in the serial case, you really donât want to recurse down to a length-1 base case if you care about performance.
Mathematically, recursing down to a trivial base case is the most elegant approach, but in practice you typically want a larger base case for performance reasons.
(Although this is well known, unfortunately itâs not typically taught to computer-science majors as far as I know. Of course, the fact that most undergraduate computer-science programming seems to be taught in Python these days makes it harder to meaningfully discuss performance optimization.)
Thank you for your input. I am not looking to optimise anything. Maybe I did not express myself correctly. Sorry for that. I am just trying figure out how to do a simple parallel quicksort in Julia like it is done in C and can be found all over internet. I am new to Julia so first step is really managing to make things work before starting optimisation.
Also, I am not sure why you are referring to computer science undergraduates. If you believe I am one, you are wrong. I have studied economics.
But youâre looking for a performance speedup. In which case you should not be spawning a task for every single element of the array, which is what happens if you recurse down to a base case of size = 1.
If you are benchmarking, then you have to pay at least some attention to performance issues, otherwise the results are pretty meaningless.
Actually the recursion is stopped in @gitboy16âs implementation.
Was reading this blog and m
is not defined in:
temp = temps[Threads.threadid()]
length(temp) < m-lo+1 && resize!(temp, m-lo+1)
copyto!(temp, 1, v, lo, m-lo+1)
You may think that your are not optimizing, but in order to make an O(n \log n) expected complexity algorithm implementation to work faster in parallel than sequentially, you have to. If it were for an O(n) algorithm, like for quickselect()
that is like quicksort()
but only recursing into one of the two âhalfsâ, it would be even harder!
You may have to parallelize the partition
computation, because in the earlier stages (especially the first call), you will be partitioning the data sequentially, and it will take a few calls to ramp up to use all your cores. A parallel partition
in-place, is non-trivial. It is a typical Google interview question.
Please look at the C link I sent at the beginning. Look at solution 2 not 1 (because it is wrong). The partition is not parallelised. If you wish, leave the partition on the sideâŚhow do you parallelise the rest?
I have made a few minor changes in your code here:
const SEQ_THRESH = 1 << 9
@inline function partition!(A, pivot, left,right)
@inbounds while left <= right
while A[left] < pivot
left += 1
end
while A[right] > pivot
right -= 1
end
if left <= right
A[left], A[right] = A[right], A[left]
left += 1
right -= 1
end
end
return (left,right)
end
function quicksort!(A, i=1, j=length(A))
if j > i
left, right = partition!(A, A[(j+i) >>> 1], i, j)
quicksort!(A,i,right)
quicksort!(A,left,j)
end
return
end
function parallel_quicksort!(A,i=1,j=length(A))
if j-i <= SEQ_THRESH
quicksort!(A, i, j)
return
end
left, right = partition!(A, A[(j+i) >>> 1], i, j)
t = Threads.@spawn parallel_quicksort!(A,i,right)
parallel_quicksort!(A,left,j)
wait(t)
return
end
function test(n)
b = rand(n)
for i = 1:5
print("Parallel : ")
a1 = copy(b); @time parallel_quicksort!(a1);
print("Sequential: ")
a2 = copy(b); @time quicksort!(a2);
println("")
@assert a1 == sort(a1)
@assert a1 == a2
end
end
test(2^20)
And I do get about 2.33x speedup on a 2-core 2013 MacBookPro with 16GB RAM.
~/p/quicksort.jl> julia -tauto --project=. test_qs.jl
Parallel : 0.057559 seconds (14.31 k allocations: 1.012 MiB, 7.69% compilation time)
Sequential: 0.114027 seconds
Parallel : 0.047487 seconds (13.52 k allocations: 992.812 KiB)
Sequential: 0.106023 seconds
Parallel : 0.047373 seconds (13.52 k allocations: 992.906 KiB)
Sequential: 0.106490 seconds
Parallel : 0.046449 seconds (13.48 k allocations: 991.656 KiB)
Sequential: 0.109551 seconds
Parallel : 0.045878 seconds (13.47 k allocations: 991.375 KiB)
Sequential: 0.107051 seconds
Perhaps if you were to undo my changes one by one, it would be very interesting to find out which aspect of your initial code caused the slowdown, (probably some type instability). Somebody wiser to help here? @Elrod @stevengj perhaps.
CHANGES:
quicksort()
return nothing instead of A
partition!()
and made it inlinewhile
block as inboundstest
function, not the REPLYou may try to use a different random array for each timing.
It is intentional to test and time the same problem.
The first run is to have the code compiled. The rest to see the variation.
Successive runs of test()
generate different problems (of the same size).
Thank you @pitsianis , it works in my side as well. I am struggling to understand what is wrong with my code. When I use code_warntype
it does not show any type instability. It seems that it is your changes 2/ that fixes it, but why is that, I have no idea. It would be great to have the view of a senior developer on the subject. It might be a good way to learn what to do and not to do when programming in Julia.