I’m trying to multithread some elaborate test over two for-loops.The important piece of code is
Threads.@threads for i in 1:length(t_values)
Threads.@threads for μ in 1:P
m_values[μ,i]=run_test(i, μ, t0s[i,μ], u0s[i,μ], ξ_array[i,μ,:,:])
end
end
It messes up and gives a completely different answer than the non-parallel version (with @threads removed). The only global variables that get used in run_test (as far as I’m aware) are parameters passed to a differentialequations run. Could race conditions on accessing those parameters be the problem here? Or might something else be going on? And if so, how to fix it? Is multithreading even the right tool in these kinds of situations or should I parallellize in a different way if the function to parallellize is complex?
You are right, that makes it more conceptually difficult. The reason that I coded it this way is that the loops are both sometimes small, and run over a range that is not divisible by the number of threads. If it works properly, this should work faster without needing to construct a complicated iterable. The reason I thought it would work is because of Parallelize nested loop in v1.72. Just from trying printing threadid, mu and i anside of each line, it seems to divide the different combinations of parameters up OK.
Anyway, I have the same problem when parallellizing one loop, which I of course checked.
Also, all the MWE’s I have tried to build up until now with a similar structure have worked. My question is basically just where to look for a race condition in my large chunk of code. Could that occur, for example, if I have a global variable N which is constant but is used in the code? And could it happen if I pass the same parameter p to run_test in each loop? My guess is yes, but I would expect that a) they are rare, and b) julia has easy ways to program something to prevent this from happening, which I could sadly not find in the documentation. It seems to me that the compiler could also be clever enough to avoid these types of race conditions, but I just don’t know in which cases it does. I should have focused my question more in that direction.
To try and give some guidance to this part of your questions: I don’t see where a race condition should occur just from the code you posted. A race condition can occur when multiple threads write to the same piece of memory, so just reading a constant value is totally fine. Your original code piece also shows that you pass completely unrelated values/arrays to the run_test function so everything is fine on this level. Are you sharing/reusing some datastructures among different runs?