Is Pmap _both_ distributed and threaded?

My typical cluster setup is 20-60 48-core AWS nodes

If I use pmap() to distribute work across them, will the cluster workers also distribute their chunk of the map operation among their threads?

No. It’s only distributed across Julia workers.

You might want to take a look at Dagger.jl which will utilize both workers and threads if possible.

(In principle, you could also start one Julia worker per core, of course)

1 Like

Thanks!

At a glance, Dagger looks a little complex for just filling an array with the results of a (slow) function using threads+distributed . Probably easier for me to just break up the work myself + use explicit threading in the jobs I spawn.

If the function is slow and allocates memory, and you’re already using Distributed anyway, then it’d probably be fastest to use one worker per core and make all workers single threaded.

Julia’s GC is single threaded, and thus GC scales poorly with threads. Thus, using distributed (e.g. pmap) instead of multiple threads is one workaround that often yields much better performance than multithreading in practice (at least when you have a lot of cores available).
The downside is mostly that it takes a lot more effort to use Distributed (e.g. @everywhere) than to use threads, but if you’re already using it anyway, I don’t see the benefit of using threads (unless you also need lots of communication among the workers).

2 Likes

I’m a game programmer, My Julia code looks like my C++ code : SIMD, with stack allocation, and I always look at the disassembly. The cache misses from running 96 copies of my code and data alone would kill any scalability. I use Julia more like an interactive c++/assembly language and a way to glue my c++ code together with good visualization tools than a GC-type language.

1 Like