Help with Threading

ok_ok · July 1, 2020, 9:36am

Hello Guys,

im not sure if im at the Right Place to ask those kind off Questions, if not im really sorry.

So my Question is about Threadding. I have got a Julia-Script where im using a lot of Recurssion, and in side that Recursive Function i have got a for loop i would like to run in parallel.

function rec(parentValue, depth)
    ...
#   SOME CALCULATIONS
    ...      
    resultValue = 0

    for i = 0:N         
        ...
         
        if .. ..
            resultValue += rec(value, depth+1)
        end            
    end

    return resultValue
end

The Amount of Instances of this Function Grows Exponential.

What i want is to run the For-Loop in Parallel, after the Loop i would like to Join the Threads back together and get the Summed ResultValue.

How could i achieve something like that?

I would also like to define how many Threads this Script/Julia is using.

How could i achieve something like this?

Ty in advanced

baggepinnen · July 1, 2020, 9:44am

See this blog post for some inspiration on threaded recursion

felipenoris · July 1, 2020, 12:11pm

Hi! fetch and wait will block the current thread. If you use them before spawning a new thread, you will effectively get a sequential execution. What you want is to first spawn all the tasks and then wait on the results.

function rec(parentValue, depth)
    ...
#   SOME CALCULATIONS
    ...
    resultValue = 0
    tasks = Vector()

    for i = 0:N
        ...

        if .. ..
            push!(tasks, Threads.@spawn rec(value, depth+1))
        end
    end

    for task in tasks
        resultValue += fetch(task)
    end

    return resultValue
end

ok_ok · July 1, 2020, 1:45pm

ah this is what i was looking for!
thank you!

ok_ok · July 2, 2020, 11:16am

could there maybe be a problem if you have to many recursive calls?

sequential takes ~1 Minute and parallel on 16 threads ~2minutes

for the current N im trying

i kinda tried to rewrite my c++ code for julia
and in c++ changing the thread count changes the calculation time as expected
but in julia 16 threads are slower then not using threadding at all

lungben · July 3, 2020, 9:06am

Hi,
a pattern that I use quite often for non-recursive parallelization (similar to loops, but dynamic scheduling) is:

futures = [Threads.@spawn f(i) for i=1:n]
results = fetch.(futures)

I do not know if this is useful for your use case, however.

lungben · July 3, 2020, 11:37am

There is a (small) overhead for spawning a task. If this overhead is larger than the calculation time of the task itself, the overall run time may end up to be larger than for the single-threaded case.
You could try to split your problem into a number of chunks, with each chunk having more than one value to calculate.

Btw. you can write resultValue = sum(results), your implementation is not type-stable if a result has a type different from Int (resultValue = 0 sets its type to Int).

felipenoris · July 3, 2020, 12:44pm

Either push!(vector, Threads.@spawn...) or vector[i] = Threads.@spawn will add the tasks to vector in the main thread. So there’s no deadlock. If you know in advance the number of tasks to spawn, you can create the vector with the correct number of elements in advance to speed things up. But if this speed up is doing something for you in terms of performance, then I wonder if your calculation is so tiny that spawning threads will add an overhead that will actually make your code slower.

ok_ok:

i also tried something like the following, but less Threads are still faster then more!?:

function rec(parentValue, depth)
     ...
    results = zeros(N+1);

    @sync begin
        for i = 0:N
            
            if ...
                ...
                @async begin
                    results[i+1] = @inbounds rec(value, depth+1)
                end                
            end            
        end

    end

    for result in results
        resultValue += result
    end        

    return resultValue
end

AFAIK, @sync and @async apply to Julia Coroutines, which yields parallel computation only if you have an IO, either by reading/write to stream or by calling something using run. I wish they worked for the Julia threading API as well, or maybe I’m just outdated in the latest Julia developments.

felipenoris · July 3, 2020, 12:50pm

The code suggested by @lungben is possibly equivalent to your previous attempts (maybe exact equivalent to the case where you know previously the size of the vector), it just uses some Julia syntax sugar to write fewer lines.

There’s no way to tell why threading is slower than single processing in your case unless you provide a full code example.

tro3 · July 3, 2020, 5:08pm

If each recursion is a quick calculation, then a new thread for each one is definitely going to slow you down. But single-threaded does seem like a waste. Can you try paralleling only one level of depth? Then each thread at that depth with stay internal, but you’ll still get all threads running. Something like:

function rec(value, depth=1)
    results = zeros(N)
    if depth == 1
        @threads for i = 0:N
            results[i+1] = @inbounds rec(value, depth+1)
        end
    else
        for i = 0:N
            results[i+1] = @inbounds rec(value, depth+1)
        end
    end
    return sum(results)
end

(edited, correcting stupidity)

Just throwing this out there. I’ll play with something this afternoon if I get time.

tro3 · July 4, 2020, 3:00pm

Yeah, when I was playing with depth == X, I found (obvious in hindsight) that I couldn’t get more than N threads to trigger, regardless of depth. Another possibility might be depth % X == 0, so you’ll get some subspawning to get the threadcount up, but still stay under the one calc per thread.

Here is the code I was using to assess, by the way. It will graphically show you the recursion and thread structure. Maybe you can modify it to more closely resemble what you’ve got.

N = 3

rootid(depth) = depth == 0 ? 0 : sum(N^x for x in 0:(depth-1))
newid(job, depth) = rootid(depth+1) + N*(job-rootid(depth))

function recursive(job=0, depth=0)
    println("$(' '^depth) $job, $depth, $(Threads.threadid())")
    depth > N && return job + 1
    results = zeros(N)
    if depth == 2
        Threads.@threads for i in 1:N
            results[i] = recursive(newid(job, depth) + i-1, depth + 1)
        end
    else
        for i in 1:N
            results[i] = recursive(newid(job, depth) + i-1, depth + 1)
        end
    end
    return sum(results)
end

Topic		Replies	Views
Recursive multithreading General Usage	2	640	September 3, 2018
Sum of n array by recursion in Parallel threads using spawn General Usage question , parallel , recursion	12	1014	February 7, 2021
The right mental model of `@threads`? General Usage parallel	6	1303	June 3, 2020
How to execute tasks in parallel in a for loop Performance parallel , multithreading , juliapro , optimization	27	2024	November 29, 2023
Julia: how to run embarrassingly parallel jobs with nested for loops? Julia at Scale parallel , multithreading	6	1537	July 13, 2021

Help with Threading

Related topics