Collecting homogenized vectors: modify iterator vs modify collector

BMO · July 12, 2020, 5:55pm

Nemo is a computer algebra package for the Julia programming language. For a multivariated polynomial p there is an iterator exponent_vectors(p) whose values are vectors corresponding to the exponents of the terms of p (in some preset monomial order).
I want to collect all the exponents of a huge polynomial in the rows of a matrix and add a first column of ones (this is to compute the Newton polytope of p in Polymake.jl).
I tryed two approxes. The first is to create a new iterator with values the exponents of p with a pushed first component set to one and then collecting this new iterator. The second, create a matix of ones of the right size and collect the exponents of p in its [2:end] columns.
I observed a big performance diference, the scond is more or less two times faster than the first.
I wonder whether it is natural or maybe the problem is my naive implementation of the first approx, which, I think, is more intrincate.

struct Homogenize{I}
    itr::I
 end

homogenize(iter) = Homogenize(iter)

Base.length(h::Homogenize) = Base.length(h.itr)
Base.size(h::Homogenize) = Base.size(h.itr)
Base.eltype(::Type{Homogenize{I}}) where {I} = eltype(I)
Base.IteratorSize(::Type{Homogenize{I}}) where {I} = Base.IteratorSize(I)
Base.IteratorEltype(::Type{Homogenize{I}}) where {I} = Base.IteratorEltype(I)

Base.@propagate_inbounds function Base.iterate(h::Homogenize, state)
    n = iterate(h.itr, state...)
    n === nothing && return n
    return [oneunit(Base.eltype(n[1])); n[1]], n[2]
end

Base.@propagate_inbounds function Base.iterate(h::Homogenize)
    n = iterate(h.itr)
    n === nothing && return n
    return [oneunit(Base.eltype(n[1])); n[1]], n[2]
end

function collect_homoiter(iter)
    ## First iteration outside the loop to prealocate v
    (u, state) = iterate(iter)
    ## Prealocate v
    v = ones(eltype(u), size(u,1)+1, length(iter))
    # v = Matrix{eltype(u)}(undef, size(u,1)+1, length(iter))
    # v[1,:] .= one(eltype(u))
    v[2:end,1] = u
    next = iterate(iter, state)
    ## Start the loop
    while next !== nothing
        (u, state) = next
        v[2:end,state] = u
        next = iterate(iter, state)
    end
    return transpose(v)
end

julia> using BenchmarkTools

julia> M = rand(100,1000);

julia> @benchmark transpose(reduce(hcat,collect(homogenize(eachrow(M)))))

BenchmarkTools.Trial: 
  memory estimate:  1.63 MiB
  allocs estimate:  2404
  --------------
  minimum time:     554.539 μs (0.00% GC)
  median time:      590.871 μs (0.00% GC)
  mean time:        668.858 μs (7.42% GC)
  maximum time:     3.596 ms (73.20% GC)
  --------------
  samples:          7422
  evals/sample:     1

julia> @benchmark collect_homoiter(eachrow(M))
BenchmarkTools.Trial: 
  memory estimate:  786.92 KiB
  allocs estimate:  106
  --------------
  minimum time:     292.677 μs (0.00% GC)
  median time:      317.791 μs (0.00% GC)
  mean time:        338.971 μs (4.16% GC)
  maximum time:     1.498 ms (63.73% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> transpose(reduce(hcat,collect(homogenize(eachcol(M))))) == collect_homoiter(eachcol(M))
true

thofma · August 19, 2020, 10:25am

I think the problem with the first implementation is that you are allocating too many temporary arrays in the iterate function. One option would be to do

Base.@propagate_inbounds function Base.iterate(h::Homogenize, state)
    n = iterate(h.itr, state...)
    n === nothing && return n
    return pushfirst!(Vector(n[1]), oneunit(Base.eltype(n[1]))), n[2]
end

This does not work with your toy example (since once cannot push into the slices of a matrix), but I think in Nemo the exponent vectors are proper arrays, so this would work.

(But I don’t think it will beat the second implementation. There is nothing faster than preallocating and then filling, although the reduce(hcat, ..) is probably quite optimized in Base).

BMO · September 28, 2020, 11:48am

Thank you so much. There is actually some improvement. I adapted the toy example to pushfirst!, here are the new times. Maybe an aside question now is why is memory estimate bigger for pushfirst!?

julia> N=100; M = rand(N,10*N); C=[M[i,:] for i in 1:N];

julia> @benchmark collect_homoiter(C)
BenchmarkTools.Trial: 
  memory estimate:  782.19 KiB
  allocs estimate:  4
  --------------
  minimum time:     178.375 μs (0.00% GC)
  median time:      195.195 μs (0.00% GC)
  mean time:        213.096 μs (4.10% GC)
  maximum time:     982.328 μs (78.91% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Using pushfirst!
julia> @benchmark transpose(reduce(hcat,collect(homogenize(C))))
BenchmarkTools.Trial: 
  memory estimate:  3.08 MiB
  allocs estimate:  603
  --------------
  minimum time:     316.035 μs (0.00% GC)
  median time:      354.262 μs (0.00% GC)
  mean time:        446.933 μs (15.52% GC)
  maximum time:     1.969 ms (61.69% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Using [oneunit(Base.eltype(n[1])); n[1]] 
julia> @benchmark transpose(reduce(hcat,collect(homogenize(C))))
BenchmarkTools.Trial: 
  memory estimate:  1.62 MiB
  allocs estimate:  2203
  --------------
  minimum time:     369.206 μs (0.00% GC)
  median time:      398.459 μs (0.00% GC)
  mean time:        465.969 μs (9.88% GC)
  maximum time:     3.922 ms (78.63% GC)
  --------------
  samples:          10000
  evals/sample:     1

thofma · September 30, 2020, 5:24pm

I am just guessing here, but pushfirst! calls something called growbeg which in turn calls the C function here: https://github.com/JuliaLang/julia/blob/0a77e07c4d964bcec172bde2918097111814a07e/src/array.c#L974. I can imagine that it allocates more space then just for one element (similar to how push! will make room for more than one element).

Topic		Replies	Views
Custom flatten version is faster than `flatten()` General Usage	4	1229	January 2, 2019
With_repetitions_permutations v0.6 VS v1 General Usage question	12	1788	February 1, 2019
Functional implementation of collect General Usage question	4	1400	February 9, 2019
A good way to iterate through sequences? General Usage question	19	2404	June 26, 2018
Concatenating iterables without allocating memory Performance	34	7501	January 16, 2020

Collecting homogenized vectors: modify iterator vs modify collector

Related topics