# How to run two tasks on parallel?

if I have two tasks and I need to them on parallel. What is the organization to do that in Julia? For example the following

``````using LinearAlgebra, SparseArrays,
n = 1000; d = 10;
A1 = sprand(n,n,d/n); x1 = rand(n);
A2 = sprand(n,n,d/n); x2 = rand(n);
``````

`Threads.@spawn` is the usual way to do it. But doing a quick read up of Parallel Computing · The Julia Language and Multi-Threading · The Julia Language will probably get you up to speed.

2 Likes

Thanks for your feedback but actually, I did not know how to do that. Cause all the documentary refer to implement parallel in terms of `for loop`.

Here is my best (nonworking) attempt with asynchronous programming:

``````function f(n, d)
A1 = sprand(n,n,d/n); x1 = rand(n);
A2 = sprand(n,n,d/n); x2 = rand(n);

# I'm not sure why @task puts things out of scope
local r1, r2;

# Unfortunately, these don't return a value
t1 = @task r1 = A1*x1;
t2 = @task r2 = A2\x2;

# Schedule and wait
schedule.([t1, t2]);
wait.([t1, t2]);
end

#n = 1000; d = 10;
#r1, r2 = f(n, d)
``````

The best approach depends on your exact problem. It looks like you’re working with tasks that have widely different run times, and that will be significant as you scale things up.

I agree that most of the tools and documentation in Julia are intended for tasks that can be broken up evenly and iteratively. Figuring out how best to solve your particular problem might take a lot of creativity.

1 Like

``````using LinearAlgebra, SparseArrays

n = 1000; d = 10;
A1 = sprand(n,n,d/n); x1 = rand(n);
A2 = sprand(n,n,d/n); x2 = rand(n);

result1 = fetch(thr1)
result2 = fetch(thr2)
``````

In that case, one should start Julia with multiple threads initialized by

`\$> julia -t n`

where n is the number of threads.

6 Likes

Actually, I am not having values for `r1, r2` any reason for that?

I’m wondering the same thing. I guess using @task changes the scope, but I wasn’t able to find a workaround.

@jbytecode recommends using threads for this sort of problem, and I think I agree.

2 Likes

Thank you very much!
How to organize the flow if there is overlap between the two threads. For example, if the execution of one thread at each time-step relates to the output of the other thread, as below.

``````using LinearAlgebra, SparseArrays, .Threads

tmin=1;
tmax=10000;
D1 = sprand(10000,0.1);
D2 = sprand(10000,0.1);
X = zeros(10000);
Y = zeros(10000);
for n in tmin+1:tmax
X[n] = D1[n]*D2[n] + Y[n-1];      # I want to make it as Thread-1
Y[n] = D1[n-1]*D2[n-1] + X[n-1];  # I want to make it as Thread-2
end

``````

You should probably check out Dagger.jl (Home · Dagger.jl). You can essentially spawn tasks that depend on previous tasks and it will only start executing once all the dependencies are also finished.

EDIT: Probably not needed here, but may be of use to OP in future.

1 Like

``````for n in tmin+1:tmax
Xn = Threads.@spawn D1[n]*D2[n] + Y[n-1];
Yn = Threads.@spawn D1[n-1]*D2[n-1] + X[n-1];
X[n] = fetch(Xn)
Y[n] = fetch(Yn)
end
``````
2 Likes

Thank you for your valuable response!

• Is creating the task inside the `for loop` time-expensive?
• Does include the `for loop` inside each task and make them communicate at the end of each iteration is more efficient to exclude the overhead? If so, what is the organization for that?

Executing things in parallel always comes with a cost. The way to alleviate this is to make sure you only use it when the operations are much more expensive than the overhead. I would suggest benchmarking both approaches (until that point, it’s just a heuristic).

In the specific code example, it seems that there are only some floating point operations happening, which are incredibly cheap. Trying to parallelise something that small will probably lead to a 10-100x slowdown, as the overhead is so much larger.

In this case, since the next iteration of the for loop directly depends on the previous iteration, you will only be able to do at most two tasks at once, and even then, the tasks are short.

In this case, I think optimising for serial performance will be better than trying to do it in parallel. A problem that is theoretically able to be parallelised often should not be as it causes huge performance hits in other areas.

To move forward, I would move both implementations into two functions and benchmark both with something like BenchmarkTools.jl

3 Likes

If I want to make one task execute (while the other is waiting for it). can I do as in the below using the parameter `mutex`? If no, what is the suitable technique?

``````function thr1(tmin,tmax,mutex,sum)
for n in tmin+1:tmax
if mutex == false
sum += 1;
mutex = true;
end
end
end

function thr2(tmin,tmax,mutex,sum)
for n in tmin+1:tmax
if mutex == true
sum += 2;
mutex = false;
end
end
end

sum = 0;
mutex = false;
tmin=1;
tmax=10000;
fetch(x1)
fetch(x2)
``````

You want to use a SpinLock instead of a binary variable (Multi-Threading · The Julia Language).

``````lock(myspinlock) do
... Calculations
end
``````

Instead of the `if mutex==true`. This lock lets a thread wait.

1 Like

Thank you very much, I will have a look!

I want to double check the execution inside the `for loop`.
1- create a spawn and assign it to Xn (does the execution start immediatly?)
2- create another spawn and assign it to Yn (does the execution start immediatly?)
3- Wait until Xn finishes
4- Wait until Yn is finishes

Yes, the first two lines inside the loop immediately schedule the task, and moves on with execution on the current thread. The fetch just waits for the result as the next iteration of the loop depends on the current one finishing.

1 Like

Thank you very much for your confirmation.

1 Like

[/quote]

I tried to make the above sequential code work on parallel as below (I know that there could be another method). However, it freezes (probably at one of the two while loops).Could you please correct the code to make a change in `flag, X,Y` in one thread be seen with the other thread?

``````function thr1(tmin,tmax,D1,D2,X,Y,flag,c)
for n in tmin+1:tmax
if flag[] == false
X[n] = D1[n]*D2[n] + Y[n-1];
lock(c)
flag[] = true;
unlock(c)
end
while flag[] == true
end
end
end

function thr2(tmin,tmax,D1,D2,X,Y,flag,c)
for n in tmin+1:tmax
while flag[] == false
end
if flag[] == true
Y[n] = D1[n-1]*D2[n-1] + X[n-1];
lock(c)
flag[] = false;
unlock(c)
end
end
end