I was experimenting with using threading to speed up a loop over
vcat that swaps the first and second half of a sequence of vectors. I fire up Julia and verify:
julia> Threads.nthreads() 4
I wrote the following simple test routine which can be cut-and-paste to the REPL:
function f1(N::Int, J::Int) x = [ collect(1:N) for j = 1:J ] nhalf = floor(Int, N/2) for j = 1:J x[j][:] = vcat(x[j][nhalf+1:end], x[j][1:nhalf]) end return x end function f1_thread(N::Int, J::Int) x = [ collect(1:N) for j = 1:J ] nhalf = floor(Int, N/2) Threads.@threads for j = 1:J x[j][:] = vcat(x[j][nhalf+1:end], x[j][1:nhalf]) end return x end x1 = f1(10, 100); x2 = f1_thread(10, 100); using BenchmarkTools @btime f1(10000, 10000); @btime f1_thread(10000, 10000);
The threaded version is about 10% faster.
Is the conclusion here that this just simply isn’t a good operation for threading? Is the problem that the overhead in setting up the threaded loop is greater than the savings made by threading? If so, are there any other options for parallelizing loops like this?
I also tried using
@distributed, but the cost of applying
x massively outweighted any gains.
I also tried
SharedArray, and had a similar issue with the setup costs, and also kept getting errors when I tried larger examples of the form:
ERROR: On worker 3: SystemError: shm_open() failed for /jl011485a0lWXYifpyIZJdgO1sIM: Too many open files
So, is the conclusion here that this just isn’t an operation that can be sped up by using multiple cores?
Cheers and thanks for all responders.