Computer specific slowdown on multi-threading on computer cluster (Linux)?

I had the same problem. In my case, this discussion clarified the issue.
I use workers (via Distributed) for the more expensive functions now. The less expensive functions are handled by threading via ThreadsX.map() from the package ThreadsX.jl.