I am creating a function that will be call n number of times in recursion to sum the elements of an array in parallel using spawn to divide the array into two parts.
For example Recursion(a, lo, hi)
a = (1,5,6,7,8,8,8,8,9,9)
rt1 = recurision(a, lo, mid)
rt2 = recursion(a, mid+1, hi)
then adding the sum of the return values from the first and second call. I am unable to store the values of rt1 and rt2 in a global variable to return the total sum of an array.
what have you tried so far?
The problem is when I return the sum, it only returns the sum of the last two recursions, because I have to initialize the value of the return which gets reset in every run
you’re doing it wrong if your “recursion” modifies/relies on some global variable
Can you please suggest the right way?
Here is the naive way of recursively writing the Fibonacci sequence:
julia> function fib(n)
if n > 1
fib(n-1) + fib(n-2)
fib (generic function with 1 method)
See how at each recursive step, you pass a the decremented argument back to the function? You can do something very similar with
sum, except it’d involve splitting original the array into halves at each step recursively (and ideally, using
view so you don’t allocate a whole new array each time)
By the way, if you’re looking for a ready-made parallel sum, check out the ThreadsX.jl package:
julia> using ThreadsX
julia> let v = randn(1_000_000)
s1 = @btime ThreadsX.sum($v, basesize=100_000)
s2 = @btime sum($v)
s1 ≈ s2
75.279 μs (860 allocations: 36.28 KiB)
251.748 μs (0 allocations: 0 bytes)
For smaller array sizes it’s pretty easy to do a lot better than ThreadsX is doing here unfortunately. Not really sure what’s wrong with it’s strategy, but a simple recursive
@spawn sum can be a fair bit faster for smaller arrays.
Thanks, I am aware of the basic recursive functions, but here when I use Spawn to divide the tasks and run them in parallel, I am unable to get the point of summing the output of spawned tasks.
Thanks, I am aware of the basic recursive functions,
Ah sorry, it’s a little hard to understand what you do and don’t know here, as the way you talked about using a global variable was setting off a lot of warning lights for me.
when I use Spawn to divide the tasks and run them in parallel, I am unable to get the point of summing the output of spawned tasks.
Does this help?
if n < 2
t = @spawn fib(n - 2)
return fib(n - 1) + fetch(t)
The idea is that you first spawn one task, then do the other computation, and then use
fetch to get the result of the first task and add them together.
If you post the code that you already have, it will be easier to help.
If you just want to sum one array in parallel, a nice simple way to do it is
if length(xs) <= 1048576
return sum(xs) # base case
t = Threads.@spawn tsum(@view xs[begin:begin+(end-begin+1)÷2])
return tsum(@view xs[begin+(end-begin+1)÷2+1:end]) + fetch(t)
Perfect solution, thanks and it worked.
By the way, unless you wnat to learn how to implement parallel very low-level code, I strongly advice to not use
@spawn directly. It’s especially true for a high-level task like this. FYI, I wrote an overview for the high-level approach here: A quick introduction to data parallelism in Julia
Thanks, I will surely read your article!