Garbage collection and threading

danielwe · December 17, 2023, 6:21am

Hi @evan-wehi, I’ve been playing around a bit with this, but I have one question before I write up what I’ve found: When you write @everywhere levels = rand(np, N), do you actually intend each worker to sample its own instance of levels? Because that’s what @everywhere accomplishes, and it’s different from what you’re doing in the multithreaded case, where all threads share the same levels. That said, these independent instances of levels are not actually used in the rest of your code—they are overwritten by the instance of levels captured in the doit closure, which is the one sampled on the main process. So in the end, your multiprocessing and multithreaded codes accomplish the same thing, however, the multiprocessing code does a lot of unnecessary sampling work up front that you could have eliminated by removing all occurrences of @everywhere within main_procs.

So to clarify, should each parallel thread/process share the same levels or use independently sampled instances?

Here's an MWE illustrating what I'm talking about above

julia> using Distributed

julia> function f()
           @everywhere x = rand()             # Samples x independently on each worker
           printlocal(_) = println(@eval(x))  # Prints the worker's own instance of x
           printmain(_) = println(x)          # Copies x over from the main process and prints it
           pmap(printlocal, 1:3)
           println()
           pmap(printmain, 1:3)
           println()
           pmap(printlocal, 1:3)              # Note how all workers now have the same x due to printmain
           return nothing
       end;

julia> f()
      From worker 3:	0.5299608462035816
      From worker 4:	0.3448114078538431
      From worker 2:	0.7929854408997841
      
      From worker 3:	0.017534970889111157
      From worker 2:	0.017534970889111157
      From worker 4:	0.017534970889111157
      
      From worker 4:	0.017534970889111157
      From worker 2:	0.017534970889111157
      From worker 3:	0.017534970889111157

Topic		Replies	Views
Scaling of @threads for "embarrassingly parallel" problem Performance threads	29	2083	January 20, 2023
Data structures for threaded computing Performance	23	2922	October 23, 2019
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	4036	February 11, 2019
Multithreading an embarrassingly parallel algorithm increases garbage collection Performance multithreading , memory , memory-allocation , garbage-collection	12	2001	March 1, 2021
Slower @threads than serial for array computations Julia at Scale	26	2748	May 7, 2020

Garbage collection and threading

Related topics