Functional implementation of collect

question

#1

I am working on something that is in a sense a generalization of collect (except it would pick named tuples apart into vectors, apply some simple RLE compression, etc), and I wanted to understand the type widening part better. For my problem, I do not know the element type or the size ex ante, so the following type is defined to make results comparable (all the code is available in a gist for quick copy&paste):

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
    if state > itr.N
        nothing
    else
        val = state == 200 ? missing : state
        (val, state + 1)
    end
end

Then I try a “functional” implementation, where a new container is returned, along with a flag that tells the caller if it changed type, in which case we recurse so that the type does not change within the function:

# "functional" implementation
store(collection::Vector{T}, elt::S) where {T, S <: T} =
    (push!(collection, elt); (collection, false))

store(collection, elt) = (vcat(collection, elt), true)

function mycollect_fun(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_fun([elt], itr, state)
end

function _mycollect_fun(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        collection, changed = store(collection, elt)
        changed && return _mycollect_fun(collection, itr, state)
    end
end

Finally, a “stateful” implementation, in which the caller queries the collection if it can accommodate the new value, and if not, expand:

# "stateful" implementation
canstore(collection::Vector{T}, elt::S) where {T, S <: T} = true
canstore(collection, elt) = false
store!(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)

function expand(collection::Vector{T}, elt::S) where {T,S}
    U = Base.promote_typejoin(T, S)
    newcollection = Vector{U}(undef, length(collection))
    copyto!(newcollection, collection)
    push!(newcollection, elt)
    newcollection
end

function mycollect_mut(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_mut([elt], itr, state)
end

function _mycollect_mut(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        !canstore(collection, elt) && return _mycollect_mut(expand(collection, elt), itr, state)
        store!(collection, elt)
    end
end

Now some benchmarks:

julia> using BenchmarkTools

julia> itr = MyItr(1000)
MyItr(1000)

julia> @btime collect($itr);
  60.646 μs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun($itr);
  36.679 μs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut($itr);
  39.187 μs (990 allocations: 46.30 KiB)

julia> VERSION
v"1.1.0-DEV.281"

Surprisingly, the functional version is fastest (which is great news, as it is the easiest for me to implement for the complicated problem), and even more surprisingly, it is faster than Base.collect.

Am I mismeasuring something, eg because the MWE is so trivial? And if this is a viable pattern, can I make this even faster?


Should Generators finally be given eltype
Getting the type of `f(x)` at compile time (without evaluating `f(x)`), for type-stable functions
How do I get the expected return type of a polymorphic function at compile time?
#2

It seems that simply testing if a new collection was allocated with isequal generates super-efficient code, eg

# "functional" implementation take 2
store!_or_widen(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)
store!_or_widen(collection, elt) = vcat(collection, elt)

function mycollect_funeq(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        expanded = store!_or_widen(collection, elt)
        expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
    end
end

benchmarks as

julia> @btime collect($itr);
  60.238 μs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun($itr);
  37.332 μs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut($itr);
  41.366 μs (990 allocations: 46.30 KiB)

julia> @btime mycollect_funeq($itr); # this is the new one
  28.801 μs (520 allocations: 39.64 KiB)


Should Generators finally be given eltype
#3

I haven’t yet taken the time to fully understand your implementation, but wanted to mention that for the “pick named tuples apart into vectors” IIUC there is an efficient implementation here in Tables: you would use it with buildcolumns(nothing, itr) (nothing because the first argument is where you’d put the NamedTuple structure if it was known in advance). It is quite crucial to be able to do this collect with element type changing and not known in advance for tabular structures, for example to collect an iterator of NamedTuples where some fields could be missing in a NamedTuple of vectors. Tables seems to have a strong focus on performance so it’ll definitely be interesting to compare what you come up with and what is in Tables.jl


#4

Impressive work! I was just looking into this a bit, and found that in Julia 1.2, collect is by far the most efficient version. Compare Julia 1.0.2 and 1.1.0-rc1.0:

julia> @btime collect($itr);
  37.825 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun($itr);
  37.072 μs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut($itr);
  42.494 μs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq($itr);
  28.066 μs (521 allocations: 39.59 KiB)

With Julia 1.2.0-DEV.94:

julia> @btime collect($itr);
  10.110 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun($itr);
  37.525 μs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut($itr);
  41.951 μs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq($itr);
  28.533 μs (521 allocations: 39.59 KiB)

Not sure what has changed, perhaps this PR?


#5

Thanks for re-benchmarking. Apparently accepting an elt with S <: T invokes a lot of type inference unnecessarily, which I now eliminated, closing almost all of the gap between Base.collect and my code (see # IMPORTANT CHANGE HERE in the code below).

However, since this is for generic collections, if I level the playing field by using Base.push_widen, the version below is faster than Base.collect.

Code

(the whole thing repeated, to make it self-contained)

using BenchmarkTools

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
    if state > itr.N
        nothing
    else
        val = state == 200 ? missing : state
        (val, state + 1)
    end
end

# IMPORTANT CHANGE HERE
store!_or_widen(collection::Vector{T}, elt::T) where {T} = push!(collection, elt)
store!_or_widen(collection, elt) = Base.push_widen(collection, elt)

function mycollect_funeq(itr)
    y = iterate(itr)
    y ≡ nothing && error("not allowed")
    elt, state = y
    _mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
    while true
        y = iterate(itr, state)
        y ≡ nothing && return collection
        elt, state = y
        expanded = store!_or_widen(collection, elt)
        expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
    end
end

itr = MyItr(1000)

@btime collect($itr);
@btime mycollect_funeq($itr);

Benchmarks

julia> @btime collect($itr);
  10.838 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_funeq($itr);
  10.343 μs (13 allocations: 30.81 KiB)

julia> VERSION
v"1.2.0-DEV.300"