Large ODE System Solving using @threads Crashes with Different Thread Counts

Hi,

I’m currently working on solving a discretized PDE-ODE coupled system, which has been discretized into a system of odes where we then a noise in the form of a stochastic process, converting the ODE system into a system of RODEs. I’m solving the system in Julia using the built-in ImplictEuler() method. Furthermore, due to the large number of equations, I am using multithreading, namely the @threads macro, to parallelize the for loop which generates the right-hand side and Jacobian of the system. I also limit the number of BLAS threads to the same number of threads used for the parallel loops.

I have been running into strange behaviour when solving the system. For the RNG, I use the same seed across all simulations of different threads count to ensure that simulations are the same each time. When I run the simulation with different amounts of specified available threads, using --threads tag when running the code through the command line, I get certain simulations crashing with a singularexception(0) exception or simulations that never finish, meaning when I use the progress bar, which I use for all simulations, the ETA increases while the bar never moves. Interestingly though, when running single-threaded, the simulations never fail nor do they get caught in a loop. This issue I am having persists across different workstations, running different architectures.

Machine 1: Intel Xeon E5-1660 and 32 GB of RAM, running Ubuntu 18.04

Machine 2: AMD Ryzen 5 3600X and 16 GB of RAM, running Ubuntu 20.04

When running one class of simulations, both machines 1 and 2 completed the simulation with threads count 1 and 4, machine 1 failed with thread count 2 (singular exception) and machine 2 passed. Both machines were stuck in a loop in thread count 6. These results carried over when I re-ran the simulations again.

On a different RNG simulation, all passed except machine 2 on thread count 6.

So, I am wondering if this issue is happening with anyone else and if there are any possible solutions to this.

Most likely your code is not thread-save. Simply test if your function gives the same output for the multi and single threaded versions.

What can you do about it:

  • Avoid mutation / make types immutable (use a functional programming style)
  • copy() all variables inside the individual threads before using them (cause allocations and may kill performance)

If you’re using caches, are your caches per-thread? 99% of the issues of this sort are from getting this wrong. I think this video (The Basics of Single Node Parallel Computing - YouTube) shows exactly how and why this occurs.

1 Like

Thanks for the quick replies. I’ll try out the threads safe stuff and look into the caches. Hopefully, they fix the problem.

Thanks,
Jack

Thank you. I tested my multi-threading output to the single-threaded output and they are the same across multiple state vectors u. I used == and it returned that they were true.

Your video was very informative thank you. I do use caching for two different arrays. Both of which, however, are only mutated in serial and only once per function call. Then in the threaded for loop I only access the cache elements. I get the same results for the single and multithreaded RHS and Jacobians when testing outside of an ODE simulation. Does BLAS have different behaviour with different thread counts?

And what if you call it multiple times?

So I just did two tests. One where I called the multi and single-threaded versions 100 times with the same state vector each time and one with a different state vector each time. I did this with a serial for loop and got that they were the same output at each iteration.

Thanks

I tightened up the tolerances of the ODE solver which has seemingly fixed the issue. Thanks everyone for the help.