Large ODE System Solving using @threads Crashes with Different Thread Counts

Jack_hues · November 18, 2020, 8:17pm

Hi,

I’m currently working on solving a discretized PDE-ODE coupled system, which has been discretized into a system of odes where we then a noise in the form of a stochastic process, converting the ODE system into a system of RODEs. I’m solving the system in Julia using the built-in ImplictEuler() method. Furthermore, due to the large number of equations, I am using multithreading, namely the @threads macro, to parallelize the for loop which generates the right-hand side and Jacobian of the system. I also limit the number of BLAS threads to the same number of threads used for the parallel loops.

I have been running into strange behaviour when solving the system. For the RNG, I use the same seed across all simulations of different threads count to ensure that simulations are the same each time. When I run the simulation with different amounts of specified available threads, using --threads tag when running the code through the command line, I get certain simulations crashing with a singularexception(0) exception or simulations that never finish, meaning when I use the progress bar, which I use for all simulations, the ETA increases while the bar never moves. Interestingly though, when running single-threaded, the simulations never fail nor do they get caught in a loop. This issue I am having persists across different workstations, running different architectures.

Machine 1: Intel Xeon E5-1660 and 32 GB of RAM, running Ubuntu 18.04

Machine 2: AMD Ryzen 5 3600X and 16 GB of RAM, running Ubuntu 20.04

When running one class of simulations, both machines 1 and 2 completed the simulation with threads count 1 and 4, machine 1 failed with thread count 2 (singular exception) and machine 2 passed. Both machines were stuck in a loop in thread count 6. These results carried over when I re-ran the simulations again.

On a different RNG simulation, all passed except machine 2 on thread count 6.

So, I am wondering if this issue is happening with anyone else and if there are any possible solutions to this.

MatFi · November 18, 2020, 10:15pm

Most likely your code is not thread-save. Simply test if your function gives the same output for the multi and single threaded versions.

What can you do about it:

Avoid mutation / make types immutable (use a functional programming style)
copy() all variables inside the individual threads before using them (cause allocations and may kill performance)

ChrisRackauckas · November 18, 2020, 10:25pm

If you’re using caches, are your caches per-thread? 99% of the issues of this sort are from getting this wrong. I think this video (The Basics of Single Node Parallel Computing - YouTube) shows exactly how and why this occurs.

Jack_hues · November 18, 2020, 11:43pm

Thanks for the quick replies. I’ll try out the threads safe stuff and look into the caches. Hopefully, they fix the problem.

Thanks,
Jack

Jack_hues · November 24, 2020, 1:54am

Thank you. I tested my multi-threading output to the single-threaded output and they are the same across multiple state vectors u. I used == and it returned that they were true.

Jack_hues · November 24, 2020, 1:57am

Your video was very informative thank you. I do use caching for two different arrays. Both of which, however, are only mutated in serial and only once per function call. Then in the threaded for loop I only access the cache elements. I get the same results for the single and multithreaded RHS and Jacobians when testing outside of an ODE simulation. Does BLAS have different behaviour with different thread counts?

MatFi · November 24, 2020, 7:42am

And what if you call it multiple times?

Jack_hues · November 24, 2020, 4:12pm

So I just did two tests. One where I called the multi and single-threaded versions 100 times with the same state vector each time and one with a different state vector each time. I did this with a serial for loop and got that they were the same output at each iteration.

Thanks

Jack_hues · January 12, 2021, 10:05pm

I tightened up the tolerances of the ODE solver which has seemingly fixed the issue. Thanks everyone for the help.

Topic		Replies	Views
Parellelization for large ODEs, Multithreading fails Performance performance , multithreading , differentialequation	16	326	March 30, 2023
Threads.@threads does not work properly General Usage	6	310	July 7, 2024
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1367	February 15, 2017
Looking to parallelize (not parametrize) the solution to a large and highly stiff ODE system Julia at Scale multithreading , ode , differentialequation , sundials , parallel-computing	7	263	July 13, 2024
Simplest way to convert a program for parallel (multithreaded) runs on multiple servers/cores? Julia at Scale	26	1935	February 22, 2021

Large ODE System Solving using @threads Crashes with Different Thread Counts

Related topics