Automatic Parallelization in Julia

Honza9723 · July 5, 2020, 1:15am

Dear All,

I would like to ask you, whether there is some automatic parallelization tool in Julia similar to auto parallelization capabilities of Intel/gcc compilers for Fortran/C++. It would be really awesome if Julia compiler could transform standard serial codes into parallel ones!

Best,
Honza

ChrisRackauckas · July 5, 2020, 1:27am

Yeah, you can do all kinds of things. Depends on what level you call automated though. There’s things like CuArrays and DistributedArrays that recompile your code to GPUs and distributed CPUs respectively, KernelAbstractions.jl that recompiles quite a big set of Julia code to GPUs, and recently things like ModelingToolkit that will take a Julia ODE code and recompile it in a multithreaded way:

https://mtk.sciml.ai/dev/tutorials/auto_parallel/

So there’s all kinds of things you can do. You’d have to be a bit more specific.

Honza9723 · July 5, 2020, 1:46am

Thank you for your reply!
I ment “highest-level”/implicit paralellism. As far, as I understand powers of Intel Fortran/C++ compiler, it basically takes a whole program, and (if auto-par option is enabled) it automatically search for parallelizable parts of whole code (loops, etc…), so programmer doesn’t have to care about explicitely declaring which part of the code should be parallelized. Especially for less experienced programmers (like me), I would guess, that good compiler optimization could provide better results than explicit parallelization.

Oscar_Smith · July 5, 2020, 1:52am

Do you have any links for this? I’m unaware of this, but would be really interested in reading about it.

Honza9723 · July 5, 2020, 2:01am

There is some description by Intel.

Also wikipedia article is pretty cool.

Mason · July 5, 2020, 3:05am

As far as I understand, the general philosophy that’s been taken so far by the Julia developers is that an optimization should only be automatically applied if they know for sure that

It’s correct / safe to apply the optimization
The optimization won’t accidentally hurt performance.

Unfortunately, implicit multi-threading makes both of the above criteria very difficult to satisfy. Even if the safety / correctness concern were satisfied (which is not trivial to do), multi-threading has a lot of overhead. The general heuristic is that it takes about 1 microsecond to spawn a multi-threaded task in Julia which is on the order of 1000 CPU cycles. This means that if I write

for i in 1:N
    f(i)
end

if it takes less than ~10 microseconds to run that loop, it was probably a mistake to try and multi-thread it. However, the amount of time the loop takes to run depends not only on N, but the details of f. Knowledge about how to handle this right is not something our compiler currently has or is likely to have anytime soon.

Instead, we generally insist that the programmer opts in to optimizations like multi-threading explicitly because they know more about their program than the compiler. However, we generally try to make it very easy to opt into these sorts of things which is where things like the performance annotations in base (Threads.@threads, @simd, @fastmath, etc.), and various packages like KernelAbstractions.jl, LoopVectorization.jl and ThreadsX.jl come in.

Topic		Replies	Views
Automatic parallelization/ multithreading General Usage	3	1065	March 6, 2019
Automatic Compiler Optimizations and Multithreading Julia at Scale multithreading , compilation	11	338	July 24, 2024
From CPU to GPU and back - compatible code for both New to Julia	3	763	December 29, 2018
Distributing loops across threads manually (something like OpenMP) Performance multithreading	14	1320	November 2, 2021
Calling Julia from HPC code Julia at Scale ccall , fortran , hpc , parallel , mpi	3	1773	September 11, 2018

Automatic Parallelization in Julia

Related topics