# Functional implementation of collect

I am working on something that is in a sense a generalization of `collect` (except it would pick named tuples apart into vectors, apply some simple RLE compression, etc), and I wanted to understand the type widening part better. For my problem, I do not know the element type or the size ex ante, so the following type is defined to make results comparable (all the code is available in a gist for quick copy&paste):

``````# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
if state > itr.N
nothing
else
val = state == 200 ? missing : state
(val, state + 1)
end
end
``````

Then I try a â€śfunctionalâ€ť implementation, where a new container is returned, along with a flag that tells the caller if it changed type, in which case we recurse so that the type does not change within the function:

``````# "functional" implementation
store(collection::Vector{T}, elt::S) where {T, S <: T} =
(push!(collection, elt); (collection, false))

store(collection, elt) = (vcat(collection, elt), true)

function mycollect_fun(itr)
y = iterate(itr)
y â‰ˇ nothing && error("not allowed")
elt, state = y
_mycollect_fun([elt], itr, state)
end

function _mycollect_fun(collection, itr, state)
while true
y = iterate(itr, state)
y â‰ˇ nothing && return collection
elt, state = y
collection, changed = store(collection, elt)
changed && return _mycollect_fun(collection, itr, state)
end
end
``````

Finally, a â€śstatefulâ€ť implementation, in which the caller queries the collection if it can accommodate the new value, and if not, expand:

``````# "stateful" implementation
canstore(collection::Vector{T}, elt::S) where {T, S <: T} = true
canstore(collection, elt) = false
store!(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)

function expand(collection::Vector{T}, elt::S) where {T,S}
U = Base.promote_typejoin(T, S)
newcollection = Vector{U}(undef, length(collection))
copyto!(newcollection, collection)
push!(newcollection, elt)
newcollection
end

function mycollect_mut(itr)
y = iterate(itr)
y â‰ˇ nothing && error("not allowed")
elt, state = y
_mycollect_mut([elt], itr, state)
end

function _mycollect_mut(collection, itr, state)
while true
y = iterate(itr, state)
y â‰ˇ nothing && return collection
elt, state = y
!canstore(collection, elt) && return _mycollect_mut(expand(collection, elt), itr, state)
store!(collection, elt)
end
end
``````

Now some benchmarks:

``````julia> using BenchmarkTools

julia> itr = MyItr(1000)
MyItr(1000)

julia> @btime collect(\$itr);
60.646 ÎĽs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun(\$itr);
36.679 ÎĽs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut(\$itr);
39.187 ÎĽs (990 allocations: 46.30 KiB)

julia> VERSION
v"1.1.0-DEV.281"
``````

Surprisingly, the functional version is fastest (which is great news, as it is the easiest for me to implement for the complicated problem), and even more surprisingly, it is faster than `Base.collect`.

Am I mismeasuring something, eg because the MWE is so trivial? And if this is a viable pattern, can I make this even faster?

4 Likes

It seems that simply testing if a new collection was allocated with `isequal` generates super-efficient code, eg

``````# "functional" implementation take 2
store!_or_widen(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)
store!_or_widen(collection, elt) = vcat(collection, elt)

function mycollect_funeq(itr)
y = iterate(itr)
y â‰ˇ nothing && error("not allowed")
elt, state = y
_mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
while true
y = iterate(itr, state)
y â‰ˇ nothing && return collection
elt, state = y
expanded = store!_or_widen(collection, elt)
expanded â‰ˇ collection || return _mycollect_funeq(expanded, itr, state)
end
end
``````

benchmarks as

``````julia> @btime collect(\$itr);
60.238 ÎĽs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun(\$itr);
37.332 ÎĽs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut(\$itr);
41.366 ÎĽs (990 allocations: 46.30 KiB)

julia> @btime mycollect_funeq(\$itr); # this is the new one
28.801 ÎĽs (520 allocations: 39.64 KiB)
``````

1 Like

I havenâ€™t yet taken the time to fully understand your implementation, but wanted to mention that for the â€śpick named tuples apart into vectorsâ€ť IIUC there is an efficient implementation here in Tables: you would use it with `buildcolumns(nothing, itr)` (`nothing` because the first argument is where youâ€™d put the `NamedTuple` structure if it was known in advance). It is quite crucial to be able to do this `collect` with element type changing and not known in advance for tabular structures, for example to collect an iterator of `NamedTuples` where some fields could be missing in a `NamedTuple` of vectors. Tables seems to have a strong focus on performance so itâ€™ll definitely be interesting to compare what you come up with and what is in Tables.jl

Impressive work! I was just looking into this a bit, and found that in Julia 1.2, `collect` is by far the most efficient version. Compare Julia 1.0.2 and 1.1.0-rc1.0:

``````julia> @btime collect(\$itr);
37.825 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun(\$itr);
37.072 ÎĽs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut(\$itr);
42.494 ÎĽs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq(\$itr);
28.066 ÎĽs (521 allocations: 39.59 KiB)
``````

With Julia 1.2.0-DEV.94:

``````julia> @btime collect(\$itr);
10.110 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun(\$itr);
37.525 ÎĽs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut(\$itr);
41.951 ÎĽs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq(\$itr);
28.533 ÎĽs (521 allocations: 39.59 KiB)
``````

Not sure what has changed, perhaps this PR?

Thanks for re-benchmarking. Apparently accepting an `elt` with `S <: T` invokes a lot of type inference unnecessarily, which I now eliminated, closing almost all of the gap between `Base.collect` and my code (see `# IMPORTANT CHANGE HERE` in the code below).

However, since this is for generic collections, if I level the playing field by using `Base.push_widen`, the version below is faster than `Base.collect`.

# Code

(the whole thing repeated, to make it self-contained)

``````using BenchmarkTools

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
if state > itr.N
nothing
else
val = state == 200 ? missing : state
(val, state + 1)
end
end

# IMPORTANT CHANGE HERE
store!_or_widen(collection::Vector{T}, elt::T) where {T} = push!(collection, elt)
store!_or_widen(collection, elt) = Base.push_widen(collection, elt)

function mycollect_funeq(itr)
y = iterate(itr)
y â‰ˇ nothing && error("not allowed")
elt, state = y
_mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
while true
y = iterate(itr, state)
y â‰ˇ nothing && return collection
elt, state = y
expanded = store!_or_widen(collection, elt)
expanded â‰ˇ collection || return _mycollect_funeq(expanded, itr, state)
end
end

itr = MyItr(1000)

@btime collect(\$itr);
@btime mycollect_funeq(\$itr);
``````

# Benchmarks

``````julia> @btime collect(\$itr);
10.838 ÎĽs (15 allocations: 30.89 KiB)

julia> @btime mycollect_funeq(\$itr);
10.343 ÎĽs (13 allocations: 30.81 KiB)

julia> VERSION
v"1.2.0-DEV.300"
``````
2 Likes