Multithreaded code on beefy computer runs just as fast as serial code on M1 Mac

Have you tried writing this code with DifferentialEquations.jl I’d expect it to be a few orders of magnitude better since it knows fancier time stepping algorithms than you do.