Looking for tips on parallelism for Differential Equations Problems

Hey, community! I am new in Julia and in multithreading/parallel computing. I am trying to clarify some things that I’ve read and to find the best way to solve my problem given my computational resources. My problem is: I would like to solve the same finite volume problem multiple times with different arguments faster in a single personal computer.

I have one function fv_strucgrid(Nx, Ny, Lx, Ly, bc, scr) that solves a finite volume structured grid of sizes Lx, Ly, divided in Nx, Ny parts respectively, bc is a boundary condition vector and scr a source term function scrfun(x,y) = 2*cos.(x)'.*cos.(y) .

My version of Julia is

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 4

and my computer infos are

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               69
Model name:          Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
Stepping:            1
CPU MHz:             1788.265
CPU max MHz:         3100,0000
CPU min MHz:         800,0000
BogoMIPS:            5187.55
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            4096K
NUMA node0 CPU(s):   0-3

The four principal ways that I found to run my code in parallel are pmap, @distributed for, Threads.@threads for and Threads.@spawn inside a for loop. Here the examples:

pmap((args) -> fv_strucgrid(args...), [[320, 320, pi, pi, rand(2*320+2*320), scrfun] for i = 1:10000])
@distributed for i =1:10000
		fv_strucgrid(320, 320, pi, pi, rand(2*320+2*320), scrfun);
    end
Threads.@threads for i =1:10000
		fv_strucgrid(320, 320, pi, pi, rand(2*320+2*320), scrfun);
    end
@sync for i =1:10000
		Threads.@spawn fv_strucgrid(320, 320, pi, pi, rand(2*320+2*320), scrfun);
    end

The only thing that I change in each interaction is the boundary condition. So my question is: does anyone could explain me or send any material/link that could help me to understand the difference between each method and help me to choose the best option for my computer? Code examples are welcomed too :slight_smile: Thanks!!

If you’re on one computer, use multithreading instead of distributed, and Threads.@threads will likely have the least overhead.

2 Likes