PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()

Thank you for this write-up.

Can you provide some comments for the following cases:

  • When the running time of each iteration is very unpredictable, how can one select a good value for the chunks_per_thread?

  • What can be done with nested loops that can be parallelized?

There is a very elegant solution to these problems employed by cilk_for of OpenCilk with work stealing and a loop grainsize hint. Can we utilize such techniques with Julia as well?