 # Collecting homogenized vectors: modify iterator vs modify collector

Nemo is a computer algebra package for the Julia programming language. For a multivariated polynomial `p` there is an iterator `exponent_vectors(p)` whose values are vectors corresponding to the exponents of the terms of `p` (in some preset monomial order).
I want to collect all the exponents of a huge polynomial in the rows of a matrix and add a first column of ones (this is to compute the Newton polytope of `p` in Polymake.jl).
I tryed two approxes. The first is to create a new iterator with values the exponents of `p` with a pushed first component set to one and then collecting this new iterator. The second, create a matix of ones of the right size and collect the exponents of `p` in its `[2:end]` columns.
I observed a big performance diference, the scond is more or less two times faster than the first.
I wonder whether it is natural or maybe the problem is my naive implementation of the first approx, which, I think, is more intrincate.

``````struct Homogenize{I}
itr::I
end

homogenize(iter) = Homogenize(iter)

Base.length(h::Homogenize) = Base.length(h.itr)
Base.size(h::Homogenize) = Base.size(h.itr)
Base.eltype(::Type{Homogenize{I}}) where {I} = eltype(I)
Base.IteratorSize(::Type{Homogenize{I}}) where {I} = Base.IteratorSize(I)
Base.IteratorEltype(::Type{Homogenize{I}}) where {I} = Base.IteratorEltype(I)

Base.@propagate_inbounds function Base.iterate(h::Homogenize, state)
n = iterate(h.itr, state...)
n === nothing && return n
return [oneunit(Base.eltype(n)); n], n
end

Base.@propagate_inbounds function Base.iterate(h::Homogenize)
n = iterate(h.itr)
n === nothing && return n
return [oneunit(Base.eltype(n)); n], n
end
``````
``````function collect_homoiter(iter)
## First iteration outside the loop to prealocate v
(u, state) = iterate(iter)
## Prealocate v
v = ones(eltype(u), size(u,1)+1, length(iter))
# v = Matrix{eltype(u)}(undef, size(u,1)+1, length(iter))
# v[1,:] .= one(eltype(u))
v[2:end,1] = u
next = iterate(iter, state)
## Start the loop
while next !== nothing
(u, state) = next
v[2:end,state] = u
next = iterate(iter, state)
end
return transpose(v)
end
``````
``````julia> using BenchmarkTools

julia> M = rand(100,1000);

julia> @benchmark transpose(reduce(hcat,collect(homogenize(eachrow(M)))))

BenchmarkTools.Trial:
memory estimate:  1.63 MiB
allocs estimate:  2404
--------------
minimum time:     554.539 μs (0.00% GC)
median time:      590.871 μs (0.00% GC)
mean time:        668.858 μs (7.42% GC)
maximum time:     3.596 ms (73.20% GC)
--------------
samples:          7422
evals/sample:     1

julia> @benchmark collect_homoiter(eachrow(M))
BenchmarkTools.Trial:
memory estimate:  786.92 KiB
allocs estimate:  106
--------------
minimum time:     292.677 μs (0.00% GC)
median time:      317.791 μs (0.00% GC)
mean time:        338.971 μs (4.16% GC)
maximum time:     1.498 ms (63.73% GC)
--------------
samples:          10000
evals/sample:     1

julia> transpose(reduce(hcat,collect(homogenize(eachcol(M))))) == collect_homoiter(eachcol(M))
true
``````

I think the problem with the first implementation is that you are allocating too many temporary arrays in the `iterate` function. One option would be to do

``````Base.@propagate_inbounds function Base.iterate(h::Homogenize, state)
n = iterate(h.itr, state...)
n === nothing && return n
return pushfirst!(Vector(n), oneunit(Base.eltype(n))), n
end
``````

This does not work with your toy example (since once cannot push into the slices of a matrix), but I think in Nemo the exponent vectors are proper arrays, so this would work.

(But I don’t think it will beat the second implementation. There is nothing faster than preallocating and then filling, although the `reduce(hcat, ..)` is probably quite optimized in Base).

Thank you so much. There is actually some improvement. I adapted the toy example to `pushfirst!`, here are the new times. Maybe an aside question now is why is memory estimate bigger for `pushfirst!`?

``````julia> N=100; M = rand(N,10*N); C=[M[i,:] for i in 1:N];

julia> @benchmark collect_homoiter(C)
BenchmarkTools.Trial:
memory estimate:  782.19 KiB
allocs estimate:  4
--------------
minimum time:     178.375 μs (0.00% GC)
median time:      195.195 μs (0.00% GC)
mean time:        213.096 μs (4.10% GC)
maximum time:     982.328 μs (78.91% GC)
--------------
samples:          10000
evals/sample:     1

## Using pushfirst!
julia> @benchmark transpose(reduce(hcat,collect(homogenize(C))))
BenchmarkTools.Trial:
memory estimate:  3.08 MiB
allocs estimate:  603
--------------
minimum time:     316.035 μs (0.00% GC)
median time:      354.262 μs (0.00% GC)
mean time:        446.933 μs (15.52% GC)
maximum time:     1.969 ms (61.69% GC)
--------------
samples:          10000
evals/sample:     1

## Using [oneunit(Base.eltype(n)); n]
julia> @benchmark transpose(reduce(hcat,collect(homogenize(C))))
BenchmarkTools.Trial:
memory estimate:  1.62 MiB
allocs estimate:  2203
--------------
minimum time:     369.206 μs (0.00% GC)
median time:      398.459 μs (0.00% GC)
mean time:        465.969 μs (9.88% GC)
maximum time:     3.922 ms (78.63% GC)
--------------
samples:          10000
evals/sample:     1
``````

I am just guessing here, but `pushfirst!` calls something called `growbeg` which in turn calls the C function here: https://github.com/JuliaLang/julia/blob/0a77e07c4d964bcec172bde2918097111814a07e/src/array.c#L974. I can imagine that it allocates more space then just for one element (similar to how `push!` will make room for more than one element).

1 Like