Simple parallel chunk splitter?

I have just put up this small package that provides a chunks function to be used within threaded loops. It is something that I find useful and appears somewhat often here:

This is so basic that I have always the impression that it must be done already, perhaps even in Base as a type of Iterator. function.

Yet, for example, Iterators.partition is not what one wants, because the partition is not the most even possible:

julia> collect.(collect(Iterators.partition((1:7), 3)))
3-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5, 6]
 [7]

vs.

julia> collect.(ChunkSplitters.chunks(1:7, i, 3) for i in 1:3)
3-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5]
 [6, 7]

Thus, one question: Is there such simple functionality in Base or in a light-dependency package?

And, if not, I hope this small package is useful for others, and I may register it then.

3 Likes

Maybe this could be added to IterTools.jl? It already includes a partition function generalized in a differnet direction.

2 Likes

That seems a a good package to have that. I’ll try to make a PR.

Yet another related package:

I don’t think it implements the splitting policies you want, but it targets very similar use cases.

1 Like

Nice! By looking at IteratorTools I now implemented the splitting as an iterator (Julia is actually cool, isn’t it?).

It works like this:

julia> using ChunkSplitters 

julia> x = rand(7);

julia> Threads.@threads for (range,ichunk) in chunks(x, 3, :batch)
           @show (range, ichunk)
       end
(range, ichunk) = (6:7, 3)
(range, ichunk) = (1:3, 1)
(range, ichunk) = (4:5, 2)

Such that we can do, slightly more cleanly:

julia> using ChunkSplitters

julia> function sum_parallel(f, x; nchunks=Threads.nthreads())
           s = fill(zero(eltype(x)), nchunks)
           Threads.@threads for (range, ichunk) in chunks(x, nchunks)
               for i in range
                  s[ichunk] += f(x[i])
              end
           end
           return sum(s)
       end
sum_parallel (generic function with 1 methods)

julia> x = rand(10^7);

julia> Threads.nthreads()
12

julia> @btime sum(x -> log(x)^7, $x)
  115.026 ms (0 allocations: 0 bytes)
-5.062317099586189e10

julia> @btime sum_parallel(x -> log(x)^7, $x; nchunks=128)
  19.210 ms (74 allocations: 7.58 KiB)
-5.062317099585973e10

That performs nicely, for example in comparison with:

julia> @btime ThreadsX.sum(x -> log(x)^7, $x)
  18.127 ms (1103 allocations: 74.14 KiB)

of course in this case there is no reason not to use ThreadsX, but that splitter allows for more customizable threading problems.

1 Like

I’ve made a pull request to IterTools: chunks by lmiq · Pull Request #95 · JuliaCollections/IterTools.jl · GitHub

And I may be registering the package anyways, so I can experiment more with the functionality (also I don’t know when or even if the PR will be accepted).