Functional implementation of collect

I am working on something that is in a sense a generalization of collect (except it would pick named tuples apart into vectors, apply some simple RLE compression, etc), and I wanted to understand the type widening part better. For my problem, I do not know the element type or the size ex ante, so the following type is defined to make results comparable (all the code is available in a gist for quick copy&paste):

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
    if state > itr.N
        nothing
    else
        val = state == 200 ? missing : state
        (val, state + 1)
    end
end

Then I try a “functional” implementation, where a new container is returned, along with a flag that tells the caller if it changed type, in which case we recurse so that the type does not change within the function:

# "functional" implementation
store(collection::Vector{T}, elt::S) where {T, S <: T} =
    (push!(collection, elt); (collection, false))

store(collection, elt) = (vcat(collection, elt), true)

function mycollect_fun(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_fun([elt], itr, state)
end

function _mycollect_fun(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        collection, changed = store(collection, elt)
        changed && return _mycollect_fun(collection, itr, state)
    end
end

Finally, a “stateful” implementation, in which the caller queries the collection if it can accommodate the new value, and if not, expand:

# "stateful" implementation
canstore(collection::Vector{T}, elt::S) where {T, S <: T} = true
canstore(collection, elt) = false
store!(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)

function expand(collection::Vector{T}, elt::S) where {T,S}
    U = Base.promote_typejoin(T, S)
    newcollection = Vector{U}(undef, length(collection))
    copyto!(newcollection, collection)
    push!(newcollection, elt)
    newcollection
end

function mycollect_mut(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_mut([elt], itr, state)
end

function _mycollect_mut(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        !canstore(collection, elt) && return _mycollect_mut(expand(collection, elt), itr, state)
        store!(collection, elt)
    end
end

Now some benchmarks:

julia> using BenchmarkTools

julia> itr = MyItr(1000)
MyItr(1000)

julia> @btime collect($itr);
  60.646 ÎĽs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun($itr);
  36.679 ÎĽs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut($itr);
  39.187 ÎĽs (990 allocations: 46.30 KiB)

julia> VERSION
v"1.1.0-DEV.281"

Surprisingly, the functional version is fastest (which is great news, as it is the easiest for me to implement for the complicated problem), and even more surprisingly, it is faster than Base.collect.

Am I mismeasuring something, eg because the MWE is so trivial? And if this is a viable pattern, can I make this even faster?

4 Likes

It seems that simply testing if a new collection was allocated with isequal generates super-efficient code, eg

# "functional" implementation take 2
store!_or_widen(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)
store!_or_widen(collection, elt) = vcat(collection, elt)

function mycollect_funeq(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        expanded = store!_or_widen(collection, elt)
        expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
    end
end

benchmarks as

julia> @btime collect($itr);
  60.238 ÎĽs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun($itr);
  37.332 ÎĽs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut($itr);
  41.366 ÎĽs (990 allocations: 46.30 KiB)

julia> @btime mycollect_funeq($itr); # this is the new one
  28.801 ÎĽs (520 allocations: 39.64 KiB)

1 Like

I haven’t yet taken the time to fully understand your implementation, but wanted to mention that for the “pick named tuples apart into vectors” IIUC there is an efficient implementation here in Tables: you would use it with buildcolumns(nothing, itr) (nothing because the first argument is where you’d put the NamedTuple structure if it was known in advance). It is quite crucial to be able to do this collect with element type changing and not known in advance for tabular structures, for example to collect an iterator of NamedTuples where some fields could be missing in a NamedTuple of vectors. Tables seems to have a strong focus on performance so it’ll definitely be interesting to compare what you come up with and what is in Tables.jl

Impressive work! I was just looking into this a bit, and found that in Julia 1.2, collect is by far the most efficient version. Compare Julia 1.0.2 and 1.1.0-rc1.0:

julia> @btime collect($itr);
  37.825 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun($itr);
  37.072 ÎĽs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut($itr);
  42.494 ÎĽs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq($itr);
  28.066 ÎĽs (521 allocations: 39.59 KiB)

With Julia 1.2.0-DEV.94:

julia> @btime collect($itr);
  10.110 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun($itr);
  37.525 ÎĽs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut($itr);
  41.951 ÎĽs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq($itr);
  28.533 ÎĽs (521 allocations: 39.59 KiB)

Not sure what has changed, perhaps this PR?

Thanks for re-benchmarking. Apparently accepting an elt with S <: T invokes a lot of type inference unnecessarily, which I now eliminated, closing almost all of the gap between Base.collect and my code (see # IMPORTANT CHANGE HERE in the code below).

However, since this is for generic collections, if I level the playing field by using Base.push_widen, the version below is faster than Base.collect.

Code

(the whole thing repeated, to make it self-contained)

using BenchmarkTools

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
    if state > itr.N
        nothing
    else
        val = state == 200 ? missing : state
        (val, state + 1)
    end
end

# IMPORTANT CHANGE HERE
store!_or_widen(collection::Vector{T}, elt::T) where {T} = push!(collection, elt)
store!_or_widen(collection, elt) = Base.push_widen(collection, elt)

function mycollect_funeq(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        expanded = store!_or_widen(collection, elt)
        expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
    end
end

itr = MyItr(1000)

@btime collect($itr);
@btime mycollect_funeq($itr);

Benchmarks

julia> @btime collect($itr);
  10.838 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_funeq($itr);
  10.343 ÎĽs (13 allocations: 30.81 KiB)

julia> VERSION
v"1.2.0-DEV.300"
2 Likes