I have just put up this small package that provides a chunks function to be used within threaded loops. It is something that I find useful and appears somewhat often here:
This is so basic that I have always the impression that it must be done already, perhaps even in Base as a type of Iterator. function.
Yet, for example, Iterators.partition is not what one wants, because the partition is not the most even possible:
Nice! By looking at IteratorTools I now implemented the splitting as an iterator (Julia is actually cool, isn’t it?).
It works like this:
julia> using ChunkSplitters
julia> x = rand(7);
julia> Threads.@threads for (range,ichunk) in chunks(x, 3, :batch)
@show (range, ichunk)
end
(range, ichunk) = (6:7, 3)
(range, ichunk) = (1:3, 1)
(range, ichunk) = (4:5, 2)
Such that we can do, slightly more cleanly:
julia> using ChunkSplitters
julia> function sum_parallel(f, x; nchunks=Threads.nthreads())
s = fill(zero(eltype(x)), nchunks)
Threads.@threads for (range, ichunk) in chunks(x, nchunks)
for i in range
s[ichunk] += f(x[i])
end
end
return sum(s)
end
sum_parallel (generic function with 1 methods)
julia> x = rand(10^7);
julia> Threads.nthreads()
12
julia> @btime sum(x -> log(x)^7, $x)
115.026 ms (0 allocations: 0 bytes)
-5.062317099586189e10
julia> @btime sum_parallel(x -> log(x)^7, $x; nchunks=128)
19.210 ms (74 allocations: 7.58 KiB)
-5.062317099585973e10
That performs nicely, for example in comparison with:
And I may be registering the package anyways, so I can experiment more with the functionality (also I don’t know when or even if the PR will be accepted).