Hi, I’m quite familiar with the workings of OpenMP and have understood the parallel programming concepts using Fork Join Model. I’m just starting out with julia and was hoping to understand multithreading in julia better. However, I could not find any good resources which can help me out. I couldn’t even find a way in which I can manually distribute work across threads (creating parallel regions, declaring private variables, etc etc). I tried using @threads macro but I’m not getting any speedups. I was doing this on a very simple problem of adding a scalar to each element of a matrix.
function colAccess(A::AbstractMatrix)
for i in 1:size(A)[1]
for j in 1:size(A)[2]
A[j,i] = A[j,i]+1
end
end
return A
end
function parallelAccess!(A::AbstractMatrix)
@threads for i in 1:size(A)[1]
for j in 1:size(A)[2]
A[j,i] = A[j,i]+1
end
end
nothing
end
I’m getting some memory allocations when running the parallel code which I think I might be able to avoid if I can distribute work manually.
Something like that may be memory bounded. Do you get any speedup on the same problem using OpenMP?
(Remember starting Julia with -t N, of course).
I may avoid allocations and have a better performance for these fast operations using other threading scheme, as @batch from Polyester.jl. Or @tturbo from LoopVectorization.jl.
I haven’t tried running this on OpenMP but what I’m really interested in is how I can implement fine grained parallelism in Julia.
I’m just trying to understand the concepts that’s why I’m implementing all this from scratch. For real world application I would certainly use libraries but right now I want to understand the workings of multi threading in Julia.
Thanks, I read a few articles but it looks like multi threading in Julia isn’t as powerful (or evolved) as something like OpenMP. I hope it gets better someday but until then I’ll have to do most of my work in C.
I was really hoping to have control over things like having variables private to threads, but from what I see there’s no way to do that.
I think this will trigger (on Monday ) interesting discussions. I’m certainly not qualified here. But, for example, if you use a pattern like
@threads for it in 1:nthreads()
s = 0
# do something with s
end
The variable s is local to the scope of the loop iteration, thus visible only for the thread in which it was created. I’m not sure if this is the kind of pattern you get in private variables in OpenMP.
Try to do something more substantial than accessing the elements of the array, otherwise the overhead of multithreading shadows any speedup:
% julia -t4 -q
julia> using Base.Threads, BenchmarkTools
julia> function colAccess!(A::AbstractMatrix)
for i in 1:size(A)[1]
for j in 1:size(A)[2]
A[j,i] = sin(A[j,i])
end
end
return A
end
colAccess! (generic function with 1 method)
julia> function parallelAccess!(A::AbstractMatrix)
@threads for i in 1:size(A)[1]
for j in 1:size(A)[2]
A[j,i] = sin(A[j,i])
end
end
return A
end
parallelAccess! (generic function with 1 method)
julia> @btime colAccess!(A) setup=(A=rand(100, 100));
66.189 μs (0 allocations: 0 bytes)
julia> @btime parallelAccess!(A) setup=(A=rand(100, 100));
24.701 μs (21 allocations: 1.86 KiB)
Maybe it’s just me but I think having structure like OpenMP would be so intuitive and easy to implement. In Julia I can’t even find a way to implement something like
I found a bunch of commands like atomicAdd! But that doesn’t solve the general purpose of having critical section inside a parallel region.
Infact not having a way to create parallel region using some macro is quite disappointing (if there’s something like that please let me know because I can’t find anything).
A good idea is to provide a more realistic example of your application. Even if you happen to find out that what you want is hard to be done in Julia, the threads here usually are very instructive.
If you don’t want to use macro for some reason, everything that can be done in parallel for loop can be constructed by just using a parallel reduce implementations like Folds.reduce. Transducers.jl makes it easy to write such parallel programs using just functions.
Would you guys have any suggestions with regard to AlphaZero? I know the notes are not perfect, however, I have tried my best to provide as much and as precise info as possible at that moment. Here is the link to the topic titled “Questions on parallelization”: Questions on parallelization · Issue #71 · jonathan-laurent/AlphaZero.jl · GitHub
Hi, I am very sorry for a delay in reply. I am (usually) on the European time. I am affraid, I am not quite in a position to add anything to what is written in the papers I listed above.
Off topic: BTW, if you are into OpenMP, I am just wondering, are you maybe familiar with Global Address SPace toolbox (https://github.com/kpamnany/gasp)? Do you know if AlphaZero computations are maybe irregular?
I found the paper: “Dtree: Dynamic Task Scheduling at Petascale” (http://www.cc.gatech.edu/~echow/pubs/dtree.pdf), Kiran Pamnany, Sanchit Misra, Vasimuddin Md., Xing Liu, Edmond Chow, and Srinivas Aluru.
I am particularly interested in your opinion with regard to running AI software, particularly AlphaZero with it. Are there any advantages to be expected over standard Julia parallel abilities? Should you have any comments please let me know.