Did you benchmark to see whether the overhead is something you even need to worry about?
I once did this PR https://github.com/JuliaLang/julia/pull/33964, which made @threads
a no-op if nthreads()==1
. But apparently that was not save but no work-around was suggested. The road-map PR is linked in that PR of mine.