Help with Threading

Hello Guys,

im not sure if im at the Right Place to ask those kind off Questions, if not im really sorry.

So my Question is about Threadding. I have got a Julia-Script where im using a lot of Recurssion, and in side that Recursive Function i have got a for loop i would like to run in parallel.

function rec(parentValue, depth)
    ...
#   SOME CALCULATIONS
    ...      
    resultValue = 0

    for i = 0:N         
        ...
         
        if .. ..
            resultValue += rec(value, depth+1)
        end            
    end

    return resultValue
end

The Amount of Instances of this Function Grows Exponential.

What i want is to run the For-Loop in Parallel, after the Loop i would like to Join the Threads back together and get the Summed ResultValue.

How could i achieve something like that?

I would also like to define how many Threads this Script/Julia is using.

How could i achieve something like this?

Ty in advanced

See this blog post for some inspiration on threaded recursion

2 Likes

Hi! fetch and wait will block the current thread. If you use them before spawning a new thread, you will effectively get a sequential execution. What you want is to first spawn all the tasks and then wait on the results.

function rec(parentValue, depth)
    ...
#   SOME CALCULATIONS
    ...
    resultValue = 0
    tasks = Vector()

    for i = 0:N
        ...

        if .. ..
            push!(tasks, Threads.@spawn rec(value, depth+1))
        end
    end

    for task in tasks
        resultValue += fetch(task)
    end

    return resultValue
end
6 Likes

ah this is what i was looking for!
thank you!

could there maybe be a problem if you have to many recursive calls?

sequential takes ~1 Minute and parallel on 16 threads ~2minutes

for the current N im trying

i kinda tried to rewrite my c++ code for julia
and in c++ changing the thread count changes the calculation time as expected
but in julia 16 threads are slower then not using threadding at all

Hi,
a pattern that I use quite often for non-recursive parallelization (similar to loops, but dynamic scheduling) is:

futures = [Threads.@spawn f(i) for i=1:n]
results = fetch.(futures)

I do not know if this is useful for your use case, however.

2 Likes

There is a (small) overhead for spawning a task. If this overhead is larger than the calculation time of the task itself, the overall run time may end up to be larger than for the single-threaded case.
You could try to split your problem into a number of chunks, with each chunk having more than one value to calculate.

Btw. you can write resultValue = sum(results), your implementation is not type-stable if a result has a type different from Int (resultValue = 0 sets its type to Int).

Either push!(vector, Threads.@spawn...) or vector[i] = Threads.@spawn will add the tasks to vector in the main thread. So there’s no deadlock. If you know in advance the number of tasks to spawn, you can create the vector with the correct number of elements in advance to speed things up. But if this speed up is doing something for you in terms of performance, then I wonder if your calculation is so tiny that spawning threads will add an overhead that will actually make your code slower.

AFAIK, @sync and @async apply to Julia Coroutines, which yields parallel computation only if you have an IO, either by reading/write to stream or by calling something using run. I wish they worked for the Julia threading API as well, or maybe I’m just outdated in the latest Julia developments.

1 Like

The code suggested by @lungben is possibly equivalent to your previous attempts (maybe exact equivalent to the case where you know previously the size of the vector), it just uses some Julia syntax sugar to write fewer lines.

There’s no way to tell why threading is slower than single processing in your case unless you provide a full code example.

If each recursion is a quick calculation, then a new thread for each one is definitely going to slow you down. But single-threaded does seem like a waste. Can you try paralleling only one level of depth? Then each thread at that depth with stay internal, but you’ll still get all threads running. Something like:

function rec(value, depth=1)
    results = zeros(N)
    if depth == 1
        @threads for i = 0:N
            results[i+1] = @inbounds rec(value, depth+1)
        end
    else
        for i = 0:N
            results[i+1] = @inbounds rec(value, depth+1)
        end
    end
    return sum(results)
end

(edited, correcting stupidity)

Just throwing this out there. I’ll play with something this afternoon if I get time.

Yeah, when I was playing with depth == X, I found (obvious in hindsight) that I couldn’t get more than N threads to trigger, regardless of depth. Another possibility might be depth % X == 0, so you’ll get some subspawning to get the threadcount up, but still stay under the one calc per thread.

Here is the code I was using to assess, by the way. It will graphically show you the recursion and thread structure. Maybe you can modify it to more closely resemble what you’ve got.

N = 3

rootid(depth) = depth == 0 ? 0 : sum(N^x for x in 0:(depth-1))
newid(job, depth) = rootid(depth+1) + N*(job-rootid(depth))

function recursive(job=0, depth=0)
    println("$(' '^depth) $job, $depth, $(Threads.threadid())")
    depth > N && return job + 1
    results = zeros(N)
    if depth == 2
        Threads.@threads for i in 1:N
            results[i] = recursive(newid(job, depth) + i-1, depth + 1)
        end
    else
        for i in 1:N
            results[i] = recursive(newid(job, depth) + i-1, depth + 1)
        end
    end
    return sum(results)
end