Sum of n array by recursion in Parallel threads using spawn

I am creating a function that will be call n number of times in recursion to sum the elements of an array in parallel using spawn to divide the array into two parts.

For example Recursion(a, lo, hi)

a = (1,5,6,7,8,8,8,8,9,9)

rt1 = recurision(a, lo, mid)
rt2 = recursion(a, mid+1, hi)

then adding the sum of the return values from the first and second call. I am unable to store the values of rt1 and rt2 in a global variable to return the total sum of an array.

what have you tried so far?

2 Likes

The problem is when I return the sum, it only returns the sum of the last two recursions, because I have to initialize the value of the return which gets reset in every run

you’re doing it wrong if your “recursion” modifies/relies on some global variable

3 Likes

Can you please suggest the right way?

Here is the naive way of recursively writing the Fibonacci sequence:

julia> function fib(n)
           if n > 1
               fib(n-1) + fib(n-2)
           else
               n
           end
       end
fib (generic function with 1 method)

See how at each recursive step, you pass a the decremented argument back to the function? You can do something very similar with sum, except it’d involve splitting original the array into halves at each step recursively (and ideally, using view so you don’t allocate a whole new array each time)


By the way, if you’re looking for a ready-made parallel sum, check out the ThreadsX.jl package:

julia> using ThreadsX

julia> let v = randn(1_000_000)
           s1 = @btime ThreadsX.sum($v, basesize=100_000)
           s2 = @btime sum($v)
           s1 ≈ s2
       end
  75.279 μs (860 allocations: 36.28 KiB)
  251.748 μs (0 allocations: 0 bytes)
true

For smaller array sizes it’s pretty easy to do a lot better than ThreadsX is doing here unfortunately. Not really sure what’s wrong with it’s strategy, but a simple recursive @spawn sum can be a fair bit faster for smaller arrays.

3 Likes

Thanks, I am aware of the basic recursive functions, but here when I use Spawn to divide the tasks and run them in parallel, I am unable to get the point of summing the output of spawned tasks.

Thanks, I am aware of the basic recursive functions,

Ah sorry, it’s a little hard to understand what you do and don’t know here, as the way you talked about using a global variable was setting off a lot of warning lights for me.


when I use Spawn to divide the tasks and run them in parallel, I am unable to get the point of summing the output of spawned tasks.

Does this help?

function fib(n::Int)
    if n < 2
        return n
    end
    t = @spawn fib(n - 2)
    return fib(n - 1) + fetch(t)
end

The idea is that you first spawn one task, then do the other computation, and then use fetch to get the result of the first task and add them together.

2 Likes

If you post the code that you already have, it will be easier to help.

1 Like

If you just want to sum one array in parallel, a nice simple way to do it is

function tsum(xs)
    if length(xs) <= 1048576
        return sum(xs)  # base case
    else
        t = Threads.@spawn tsum(@view xs[begin:begin+(end-begin+1)÷2])
        return tsum(@view xs[begin+(end-begin+1)÷2+1:end]) + fetch(t)
    end
end
1 Like

Perfect solution, thanks and it worked.

By the way, unless you wnat to learn how to implement parallel very low-level code, I strongly advice to not use @spawn directly. It’s especially true for a high-level task like this. FYI, I wrote an overview for the high-level approach here: A quick introduction to data parallelism in Julia

7 Likes

Thanks, I will surely read your article!