I’m not at that computer in the moment, but for Thread.nthreads() I got 16.
Shouldn’t that be enough? (I did it in Juno.)
(Until now I only played with your version of the inner loop, as I don’t have an interesting amount of threads on my current machine.)