How to run tasks in parallel?

Skoffer · February 22, 2020, 3:25pm

I am trying to understand how to work properly with multithreading. Currently, I am on julia 1.3.1, which is started with JULIA_NUM_THREADS=4 julia and trying to simulate simple embarrassingly parallel task.

f(x) = sum(x)

x1 = rand(10_000_000)
x2 = rand(10_000_000)

function g(x1, x2)
    t1 = Threads.@spawn f(x1)
    t2 = Threads.@spawn f(x2)
    fetch(t1) + fetch(t2)
end

When I am trying to benchmark I get strange results

julia> @btime f($x1)
  4.707 ms (0 allocations: 0 bytes)

julia> @btime f($x2)
  4.652 ms (0 allocations: 0 bytes)

julia> @btime g($x1, $x2)
  9.096 ms (20 allocations: 1.56 KiB)

so it looks like instead of being run in parallel tasks are run sequentially. I suppose something simple is missed, but I can’t figure out what exactly.

It’s not an overhead problem, because I get similar results with different sizes of vectors

x1 = rand(1_000_000);
x2 = rand(1_000_000);

@btime g($x1, $x2)
#  673.669 μs (20 allocations: 1.56 KiB)
@btime f($x1)
#  281.192 μs (0 allocations: 0 bytes)

Jakub_Wronowski · February 22, 2020, 4:40pm

For me your usage looks fine. Maybe you are hitting memory bandwidth limit? What if your f would use more cpu and less memory?

function f(n)
    res = 0.0
    for i in 1:n
        res += rand()
    end
    return res
end

function g(n1, n2)
    t1 = Threads.@spawn f(n1)
    t2 = Threads.@spawn f(n2)
    fetch(t1) + fetch(t2)
end

julia> @btime f($10^8)
  563.086 ms (0 allocations: 0 bytes)
4.999384091918092e7

julia> @btime g($10^8, $10^8)
  770.733 ms (16 allocations: 1.41 KiB)
9.999607106383583e7

kristoffer.carlsson · February 22, 2020, 7:28pm

~~What does Threads.nthreads() say?~~ nvm

Skoffer · February 22, 2020, 7:35pm

You are absolutely right. Out of curiosity I implemented the same code in C++ with similar results. Almost no performance gain in simple sum, but substantial difference with more cpu intensive calculations. Guess that’s wonderful world of multi-threading for you. So I suppose rule of thumb is “if you have many lightweight calculations than you should try to combine them in one big kernel”?

Oh, now I suddenly understood what broadcast fusing was about…

Jakub_Wronowski · February 22, 2020, 7:46pm

Yes. Personally I really like functional approach (like transducers or fmap). Instead of processing arrays and store intermediate result, construct processing flow by lazy iteration and apply composed functions once.

could you elaborate? I am not into it, but i guess it something similar to transducers

Skoffer · February 22, 2020, 7:53pm

There is a great article, which definitely worth reading: More Dots: Syntactic Loop Fusion in Julia

Well, one of the idea is actually the same, if you have many vectorized computations than it’s more practical to combine them into a single loop. Of course in case of broadcast allocations of temporary arrays was more important, but memory locality took it part as well.

Jakub_Wronowski · February 22, 2020, 8:25pm

Thanks, the article is great, I like the comparison to map.
I have 2 thoughts related to this topic:

Languages like C# spend some extra cycles during iteration, so things like this execute faster in parallel. Julia does it so fast, that there is no room for that, RAM is just too slow to occupy CPU.
Memory is a bottleneck for modern CPUs (32+ cores). I am trying to develop sound synthesizer that needs to sum massive number of sinusoids. I discovered it is faster to generate samples from differential equation in cpu, rather than read and interpolate them from pre-calculated tables.

Topic		Replies	Views
Multithreaded computation significantly slower Performance question	18	3964	October 17, 2020
Embarrassingly parallel multi-threading doesn't scale Performance multithreading	17	1614	October 16, 2021
A question about parallel performance in multithreading Performance question , performance , multithreading	10	659	May 12, 2022
Multi-threading on a 2 CPU system New to Julia multithreading	15	1084	February 2, 2023
How to perform parallel vector addition? New to Julia question	24	1820	December 27, 2022

How to run tasks in parallel?

Related topics