Why so many allocations in the loop on partition

Maybe I do the measurements wrong, but I’m surprised that partition performs so badly in this case.

using IterTools, BenchmarkTools
a=rand(1:10,10^5)

function looppart(a)
res=zeros(Int,20)
for (a1,a2) in partition(a,2,1)
    res[a1+a2]+=1
end
res
end

function loopzip(a)
res=zeros(Int,20)
for (a1,a2) in zip((@view a[1:end-1]), @view a[2:end])
    res[a1+a2]+=1
end
res
end

julia> @btime looppart(a);
  16.412 ms (399998 allocations: 21.36 MiB)

julia> @btime loopzip(a);
  71.300 μs (1 allocation: 224 bytes)

1 Like

IterTools.partition was very inefficient until recent fast partition by pepijndevos · Pull Request #108 · JuliaCollections/IterTools.jl · GitHub. With it, the timings are close.
I just wish it was the common practice to make a new release whenever a PR get merged…

3 Likes

How do you upload the version with the optimized function, to do this test?
If it’s not too complicated, what are the changes made to make the function more efficient?
My question comes from trying to solve this problem

You can remove the registry package and reinstall it using the following method:

pkg> add https://github.com/JuliaCollections/IterTools.jl

This will enable you to use the latest changes even if a release was not issued yet.

I would suggest investigating the PR for the changes inspection - I didn’t get into that code myself.

1 Like

here are the new measures after loading the #master

julia> a=rand(1:10,10^5);

julia> function looppart(a)
       res=zeros(Int,20)
       for (a1,a2) in partition(a,2,1)
           res[a1+a2]+=1
       end
       res
       end
looppart (generic function with 1 method)

julia> function loopzip(a)
       res=zeros(Int,20)
       for (a1,a2) in zip((@view a[1:end-1]), @view a[2:end])
           res[a1+a2]+=1
       end
       res
end
loopzip (generic function with 1 method)

julia> function loopzip1(a)
       res=zeros(Int,20)
       for (a1,a2) in zip(a, @view a[2:end])
           res[a1+a2]+=1
       end
       res
end
loopzip1 (generic function with 1 method)

julia> function loopzip2(a)
           res=zeros(Int,20)
           for i in 1:10^5-1
                res[a[i]+a[i+1]]+=1
           end
           res
       end
loopzip2 (generic function with 1 method)


julia> @btime looppart(a);
  64.000 μs (1 allocation: 224 bytes)

julia> @btime loopzip(a);
  70.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip1(a);
  60.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip2(a);
  76.400 μs (1 allocation: 224 bytes)
1 Like

I’m afraid it’s too difficult for me, especially if done in a very general/abstract way
However, I would like to understand at least the principle on which the improvements are based

Another way for this specific problem (not general partition iteration) is:

julia> function looppeek(a)
           res=zeros(Int,20)
           itr = Iterators.Stateful(a)
           @inbounds for v1 in itr
               isnothing(peek(itr)) && break
                res[v1+peek(itr)]+=1
           end
           res
       end
looppeek (generic function with 1 method)

julia> @btime looppeek($a);

It achieves the fast performance with only one allocation.

Looked a bit into the code, and it uses @generated functions to produce a function specific for the number of elements in the returned tuples (2 in this discussion). Generating code with known length of partitions allows the use of tuples (which must have compile-time known lengths and types) and avoids the allocations. Elegant code, which is worth looking at.

2 Likes

in this specific case, a simple for loop assisted by the @inbounds macro seems to do better than the others


a=rand(1:10,10^5)


julia> function looppeek(a)
           res=zeros(Int,20)
           itr = Iterators.Stateful(a)
           @inbounds for v1 in itr
               isnothing(peek(itr)) && break
                res[v1+peek(itr)]+=1
           end
           res
       end
looppeek (generic function with 1 method)


julia> function loopindex(a)
       res=zeros(Int,20)
       @inbounds for i in 1:10^5-1
           res[a[i]+a[i+1]]+=1
       end
       res
       end
loopindex (generic function with 1 method)



julia> @btime looppeek($a);
  60.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip(a);
  71.100 μs (1 allocation: 224 bytes)

julia> @btime loopzip1(a);
  60.400 μs (1 allocation: 224 bytes)

julia> @btime loopindex(a);
  42.400 μs (1 allocation: 224 bytes)