Why so many allocations in the loop on partition

rocco_sprmnt21 · July 14, 2023, 8:05am

Maybe I do the measurements wrong, but I’m surprised that partition performs so badly in this case.

using IterTools, BenchmarkTools
a=rand(1:10,10^5)

function looppart(a)
res=zeros(Int,20)
for (a1,a2) in partition(a,2,1)
    res[a1+a2]+=1
end
res
end

function loopzip(a)
res=zeros(Int,20)
for (a1,a2) in zip((@view a[1:end-1]), @view a[2:end])
    res[a1+a2]+=1
end
res
end

julia> @btime looppart(a);
  16.412 ms (399998 allocations: 21.36 MiB)

julia> @btime loopzip(a);
  71.300 μs (1 allocation: 224 bytes)

aplavin · July 14, 2023, 11:55am

IterTools.partition was very inefficient until recent fast partition by pepijndevos · Pull Request #108 · JuliaCollections/IterTools.jl · GitHub. With it, the timings are close.
I just wish it was the common practice to make a new release whenever a PR get merged…

rocco_sprmnt21 · July 14, 2023, 1:05pm

How do you upload the version with the optimized function, to do this test?
If it’s not too complicated, what are the changes made to make the function more efficient?
My question comes from trying to solve this problem

algunion · July 14, 2023, 1:12pm

You can remove the registry package and reinstall it using the following method:

pkg> add https://github.com/JuliaCollections/IterTools.jl

This will enable you to use the latest changes even if a release was not issued yet.

I would suggest investigating the PR for the changes inspection - I didn’t get into that code myself.

rocco_sprmnt21 · July 14, 2023, 1:38pm

here are the new measures after loading the #master

julia> a=rand(1:10,10^5);

julia> function looppart(a)
       res=zeros(Int,20)
       for (a1,a2) in partition(a,2,1)
           res[a1+a2]+=1
       end
       res
       end
looppart (generic function with 1 method)

julia> function loopzip(a)
       res=zeros(Int,20)
       for (a1,a2) in zip((@view a[1:end-1]), @view a[2:end])
           res[a1+a2]+=1
       end
       res
end
loopzip (generic function with 1 method)

julia> function loopzip1(a)
       res=zeros(Int,20)
       for (a1,a2) in zip(a, @view a[2:end])
           res[a1+a2]+=1
       end
       res
end
loopzip1 (generic function with 1 method)

julia> function loopzip2(a)
           res=zeros(Int,20)
           for i in 1:10^5-1
                res[a[i]+a[i+1]]+=1
           end
           res
       end
loopzip2 (generic function with 1 method)


julia> @btime looppart(a);
  64.000 μs (1 allocation: 224 bytes)

julia> @btime loopzip(a);
  70.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip1(a);
  60.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip2(a);
  76.400 μs (1 allocation: 224 bytes)

rocco_sprmnt21 · July 14, 2023, 3:18pm

I’m afraid it’s too difficult for me, especially if done in a very general/abstract way
However, I would like to understand at least the principle on which the improvements are based

Dan · July 14, 2023, 5:17pm

Another way for this specific problem (not general partition iteration) is:

julia> function looppeek(a)
           res=zeros(Int,20)
           itr = Iterators.Stateful(a)
           @inbounds for v1 in itr
               isnothing(peek(itr)) && break
                res[v1+peek(itr)]+=1
           end
           res
       end
looppeek (generic function with 1 method)

julia> @btime looppeek($a);

It achieves the fast performance with only one allocation.

Looked a bit into the code, and it uses @generated functions to produce a function specific for the number of elements in the returned tuples (2 in this discussion). Generating code with known length of partitions allows the use of tuples (which must have compile-time known lengths and types) and avoids the allocations. Elegant code, which is worth looking at.

rocco_sprmnt21 · July 14, 2023, 5:39pm

in this specific case, a simple for loop assisted by the @inbounds macro seems to do better than the others


a=rand(1:10,10^5)


julia> function looppeek(a)
           res=zeros(Int,20)
           itr = Iterators.Stateful(a)
           @inbounds for v1 in itr
               isnothing(peek(itr)) && break
                res[v1+peek(itr)]+=1
           end
           res
       end
looppeek (generic function with 1 method)


julia> function loopindex(a)
       res=zeros(Int,20)
       @inbounds for i in 1:10^5-1
           res[a[i]+a[i+1]]+=1
       end
       res
       end
loopindex (generic function with 1 method)



julia> @btime looppeek($a);
  60.500 μs (1 allocation: 224 bytes)

julia> @btime loopzip(a);
  71.100 μs (1 allocation: 224 bytes)

julia> @btime loopzip1(a);
  60.400 μs (1 allocation: 224 bytes)

julia> @btime loopindex(a);
  42.400 μs (1 allocation: 224 bytes)

Topic		Replies	Views
Using Base.Iterators for optimization, good idea? Performance question	0	196	October 7, 2023
IterTools.jl Partition type stability? Performance	3	680	August 22, 2018
Best way to store integer partitions Performance	6	951	December 15, 2019
Unexpected allocation Performance	7	306	January 30, 2024
Weird allocation issue for fill! with ArrayPartition in Julia < 1.6 Performance array , memory-allocation	1	354	May 17, 2021

Why so many allocations in the loop on partition

Related topics