When to use pmap vs Threads.@threads?

chelseas · December 9, 2021, 5:29am

Some of my processes may take a minute, and others miliseconds, so load balancing seems like a good idea. Do both do this?
Any other things I should know about when using pmap vs. @threads?

I’ve tagged jump because I will be copying a JuMP model to each thread using copy_model.

Tomas_Pevny · December 9, 2021, 6:14am

It depends on your application. If you allocate / deallocate a lot of memory in thread (workers), you might prefer pmap, since GC is distributed there. On the other hand if you need access and writes to shared structures, you might want Threads.@threads. Also, if each task can take different length of time, something like FLoops might be better then the static scheduler of Threads.@threads.

More information would be needed for a recommendation, but the best is always to test both approaches.

carstenbauer · December 9, 2021, 7:43am

No. pmap does, @threads does not. However, you could use Threads.@spawn (see https://nbviewer.org/github/carstenbauer/JuliaNRWSS21/blob/main/backup/load_balancing.ipynb for an example/comparison).

Note that you are comparing multithreading to multi-processing here.

chelseas · December 9, 2021, 8:07am

Thanks for the info! To be honest I’m not super sure what the difference between multi-threading and multi-processing is…and which would be better for my application.
It might be nice to have access to shared data structures (for reading) but I could also just duplicate these data structures. I don’t need access to any shared containers for writing.

lawless-m · December 9, 2021, 8:23am

One way to think about it, especially with Distributed on Julia, is pmap could be running on entirely different computers - maybe separated by an ocean.

Your Threads all run on the local machine.

chelseas · December 9, 2021, 8:31am

Ok, good to know. Can I actually run pmap across several servers?

Also is there anything wrong with doing this? Slash does this construct have a name?:

function wrapper(problem)
    function myfun(cell)
        # call functions that use info from problem and from cell
    end
    return myfun
end
pmap(wrapper(problem), cells)

lawless-m · December 9, 2021, 8:38am

I don’t know if that pattern has a specific name in Julia.

The general term for things like that is “Function Factory”.

https://adv-r.hadley.nz/function-factories.html

And I hope they are OK. I use them all the time.

chelseas · December 9, 2021, 8:58am

Does this “across an ocean” thing mean that I can’t access global variables? In addition to some copy-able variables, I have some really big data tables that I shouldn’t be copying with every cell that I pass to pmap.

carstenbauer · December 9, 2021, 8:59am

There have been many discussions on this here on discourse (and, of course, elsewhere). See for example Multi-threading or multi-processing, how to know which to use and when?

lawless-m · December 9, 2021, 10:32am

Yes, that was my point of describing it like that.

If you want separate Processes to have access to data, they need their own copy somehow.

So two processes needing the same 1Gb of in-memory data, means 2Gb of RAM is required.

Whereas Threads can all read the same 1Gb.

Of course, the downside to that is the Threads can all (possibly) write to it too (if it is mutable), and orchestrating that becomes your problem to solve.

chelseas · December 16, 2021, 7:40am

Oooo what if I use BOTH Threads@threads and pmap in different parts of my code? Is it bad to start julia with both multiple threads: -t 20 and multiple processes: -p 20 ? (for example)
And then in the main process, I first do something with shared memory using @threads and then later in the program I send work to my 20 workers?
Is there like a lot of overhead involved if I set both -t and -p >1 ?

lawless-m · December 16, 2021, 8:28am

No, do whatever solves your problem, there is no “one way to do things”.

In fact, mixing and matching will help you learn which to use next time.

And be sure to test your assumptions by running things on a single Thread / Process. What often seems like a good candidate to parallelize sees no benefit and may actually run slower because of the communication overhead and context switching.

Topic		Replies	Views
Multithreading and pmap Julia at Scale	8	2879	January 5, 2019
Difference between multiprocessors and multithreads General Usage parallel , multithreading , distributed	2	424	September 27, 2022
Local Parallel Processing Benchmark Advice Julia at Scale	2	826	April 26, 2018
Pmap: does it copy or share arguments across processes? Julia at Scale parallel	9	1614	November 26, 2017
Is Pmap _both_ distributed and threaded? Performance multithreading , distributed , pmap	4	545	December 28, 2021

When to use pmap vs Threads.@threads?

Related topics