function f(n, d)
A1 = sprand(n,n,d/n); x1 = rand(n);
A2 = sprand(n,n,d/n); x2 = rand(n);
# I'm not sure why @task puts things out of scope
local r1, r2;
# Create the tasks
# Unfortunately, these don't return a value
t1 = @task r1 = A1*x1;
t2 = @task r2 = A2\x2;
# Schedule and wait
schedule.([t1, t2]);
wait.([t1, t2]);
end
#n = 1000; d = 10;
#r1, r2 = f(n, d)
The best approach depends on your exact problem. It looks like you’re working with tasks that have widely different run times, and that will be significant as you scale things up.
I agree that most of the tools and documentation in Julia are intended for tasks that can be broken up evenly and iteratively. Figuring out how best to solve your particular problem might take a lot of creativity.
Thank you very much!
How to organize the flow if there is overlap between the two threads. For example, if the execution of one thread at each time-step relates to the output of the other thread, as below.
Your feedbacks is really appreciated.
using LinearAlgebra, SparseArrays, .Threads
tmin=1;
tmax=10000;
D1 = sprand(10000,0.1);
D2 = sprand(10000,0.1);
X = zeros(10000);
Y = zeros(10000);
for n in tmin+1:tmax
X[n] = D1[n]*D2[n] + Y[n-1]; # I want to make it as Thread-1
Y[n] = D1[n-1]*D2[n-1] + X[n-1]; # I want to make it as Thread-2
end
You should probably check out Dagger.jl (Home · Dagger.jl). You can essentially spawn tasks that depend on previous tasks and it will only start executing once all the dependencies are also finished.
EDIT: Probably not needed here, but may be of use to OP in future.
Is creating the task inside the for loop time-expensive?
Does include the for loop inside each task and make them communicate at the end of each iteration is more efficient to exclude the overhead? If so, what is the organization for that?
Executing things in parallel always comes with a cost. The way to alleviate this is to make sure you only use it when the operations are much more expensive than the overhead. I would suggest benchmarking both approaches (until that point, it’s just a heuristic).
In the specific code example, it seems that there are only some floating point operations happening, which are incredibly cheap. Trying to parallelise something that small will probably lead to a 10-100x slowdown, as the overhead is so much larger.
In this case, since the next iteration of the for loop directly depends on the previous iteration, you will only be able to do at most two tasks at once, and even then, the tasks are short.
In this case, I think optimising for serial performance will be better than trying to do it in parallel. A problem that is theoretically able to be parallelised often should not be as it causes huge performance hits in other areas.
To move forward, I would move both implementations into two functions and benchmark both with something like BenchmarkTools.jl
Thanks for your feedback.
If I want to make one task execute (while the other is waiting for it). can I do as in the below using the parameter mutex? If no, what is the suitable technique?
function thr1(tmin,tmax,mutex,sum)
for n in tmin+1:tmax
if mutex == false
sum += 1;
mutex = true;
end
end
end
function thr2(tmin,tmax,mutex,sum)
for n in tmin+1:tmax
if mutex == true
sum += 2;
mutex = false;
end
end
end
sum = 0;
mutex = false;
tmin=1;
tmax=10000;
x1 = Threads.@spawn thr1(tmin,tmax,mutex,sum)
x2 = Threads.@spawn thr2(tmin,tmax,mutex,sum)
fetch(x1)
fetch(x2)
I want to double check the execution inside the for loop.
1- create a spawn and assign it to Xn (does the execution start immediatly?)
2- create another spawn and assign it to Yn (does the execution start immediatly?)
3- Wait until Xn finishes
4- Wait until Yn is finishes
Please correct me if wrong.
Yes, the first two lines inside the loop immediately schedule the task, and moves on with execution on the current thread. The fetch just waits for the result as the next iteration of the loop depends on the current one finishing.
I tried to make the above sequential code work on parallel as below (I know that there could be another method). However, it freezes (probably at one of the two while loops).Could you please correct the code to make a change in flag, X,Y in one thread be seen with the other thread?
function thr1(tmin,tmax,D1,D2,X,Y,flag,c)
for n in tmin+1:tmax
if flag[] == false
X[n] = D1[n]*D2[n] + Y[n-1];
lock(c)
flag[] = true;
unlock(c)
end
while flag[] == true
end
end
end
function thr2(tmin,tmax,D1,D2,X,Y,flag,c)
for n in tmin+1:tmax
while flag[] == false
end
if flag[] == true
Y[n] = D1[n-1]*D2[n-1] + X[n-1];
lock(c)
flag[] = false;
unlock(c)
end
end
end
c = Base.Threads.Condition();
flag = Ref(false);
t1 = Threads.@spawn thr1(tmin,tmax,D1,D2,X,Y,flag,c)
t2 = Threads.@spawn thr2(tmin,tmax,D1,D2,X,Y,flag,c)
fetch(t1)
fetch(t2)