Overhead of `Threads.@threads`

Yes, we do see an overhead. If I remove all Threads.@threads, we get the following timings for different numbers of DG elements for one right-hand side evaluation.

  #Elements | Runtime in seconds
          1 | 1.39e-06
          4 | 1.66e-06
         16 | 2.86e-06
         64 | 7.70e-06
        256 | 2.69e-05
       1024 | 1.04e-04
       4096 | 4.27e-04
      16384 | 1.87e-03
      65536 | 9.75e-03

If we use Threads.@threads with a single thread as we do now, I get

#Elements | Runtime in seconds
          1 | 1.86e-05
          4 | 1.87e-05
         16 | 2.03e-05
         64 | 2.68e-05
        256 | 4.62e-05
       1024 | 1.25e-04
       4096 | 4.44e-04
      16384 | 1.93e-03
      65536 | 9.67e-03

Thanks for this link!