# Functional implementation of collect

I am working on something that is in a sense a generalization of `collect` (except it would pick named tuples apart into vectors, apply some simple RLE compression, etc), and I wanted to understand the type widening part better. For my problem, I do not know the element type or the size ex ante, so the following type is defined to make results comparable (all the code is available in a gist for quick copy&paste):

``````# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
if state > itr.N
nothing
else
val = state == 200 ? missing : state
(val, state + 1)
end
end
``````

Then I try a “functional” implementation, where a new container is returned, along with a flag that tells the caller if it changed type, in which case we recurse so that the type does not change within the function:

``````# "functional" implementation
store(collection::Vector{T}, elt::S) where {T, S <: T} =
(push!(collection, elt); (collection, false))

store(collection, elt) = (vcat(collection, elt), true)

function mycollect_fun(itr)
y = iterate(itr)
y ≡ nothing && error("not allowed")
elt, state = y
_mycollect_fun([elt], itr, state)
end

function _mycollect_fun(collection, itr, state)
while true
y = iterate(itr, state)
y ≡ nothing && return collection
elt, state = y
collection, changed = store(collection, elt)
changed && return _mycollect_fun(collection, itr, state)
end
end
``````

Finally, a “stateful” implementation, in which the caller queries the collection if it can accommodate the new value, and if not, expand:

``````# "stateful" implementation
canstore(collection::Vector{T}, elt::S) where {T, S <: T} = true
canstore(collection, elt) = false
store!(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)

function expand(collection::Vector{T}, elt::S) where {T,S}
U = Base.promote_typejoin(T, S)
newcollection = Vector{U}(undef, length(collection))
copyto!(newcollection, collection)
push!(newcollection, elt)
newcollection
end

function mycollect_mut(itr)
y = iterate(itr)
y ≡ nothing && error("not allowed")
elt, state = y
_mycollect_mut([elt], itr, state)
end

function _mycollect_mut(collection, itr, state)
while true
y = iterate(itr, state)
y ≡ nothing && return collection
elt, state = y
!canstore(collection, elt) && return _mycollect_mut(expand(collection, elt), itr, state)
store!(collection, elt)
end
end
``````

Now some benchmarks:

``````julia> using BenchmarkTools

julia> itr = MyItr(1000)
MyItr(1000)

julia> @btime collect(\$itr);
60.646 μs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun(\$itr);
36.679 μs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut(\$itr);
39.187 μs (990 allocations: 46.30 KiB)

julia> VERSION
v"1.1.0-DEV.281"
``````

Surprisingly, the functional version is fastest (which is great news, as it is the easiest for me to implement for the complicated problem), and even more surprisingly, it is faster than `Base.collect`.

Am I mismeasuring something, eg because the MWE is so trivial? And if this is a viable pattern, can I make this even faster?

4 Likes

It seems that simply testing if a new collection was allocated with `isequal` generates super-efficient code, eg

``````# "functional" implementation take 2
store!_or_widen(collection::Vector{T}, elt::S) where {T, S <: T} = push!(collection, elt)
store!_or_widen(collection, elt) = vcat(collection, elt)

function mycollect_funeq(itr)
y = iterate(itr)
y ≡ nothing && error("not allowed")
elt, state = y
_mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
while true
y = iterate(itr, state)
y ≡ nothing && return collection
elt, state = y
expanded = store!_or_widen(collection, elt)
expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
end
end
``````

benchmarks as

``````julia> @btime collect(\$itr);
60.238 μs (14 allocations: 30.94 KiB)

julia> @btime mycollect_fun(\$itr);
37.332 μs (1520 allocations: 70.88 KiB)

julia> @btime mycollect_mut(\$itr);
41.366 μs (990 allocations: 46.30 KiB)

julia> @btime mycollect_funeq(\$itr); # this is the new one
28.801 μs (520 allocations: 39.64 KiB)
``````

1 Like

I haven’t yet taken the time to fully understand your implementation, but wanted to mention that for the “pick named tuples apart into vectors” IIUC there is an efficient implementation here in Tables: you would use it with `buildcolumns(nothing, itr)` (`nothing` because the first argument is where you’d put the `NamedTuple` structure if it was known in advance). It is quite crucial to be able to do this `collect` with element type changing and not known in advance for tabular structures, for example to collect an iterator of `NamedTuples` where some fields could be missing in a `NamedTuple` of vectors. Tables seems to have a strong focus on performance so it’ll definitely be interesting to compare what you come up with and what is in Tables.jl

Impressive work! I was just looking into this a bit, and found that in Julia 1.2, `collect` is by far the most efficient version. Compare Julia 1.0.2 and 1.1.0-rc1.0:

``````julia> @btime collect(\$itr);
37.825 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun(\$itr);
37.072 μs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut(\$itr);
42.494 μs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq(\$itr);
28.066 μs (521 allocations: 39.59 KiB)
``````

With Julia 1.2.0-DEV.94:

``````julia> @btime collect(\$itr);
10.110 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_fun(\$itr);
37.525 μs (1520 allocations: 70.81 KiB)

julia> @btime mycollect_mut(\$itr);
41.951 μs (991 allocations: 46.25 KiB)

julia> @btime mycollect_funeq(\$itr);
28.533 μs (521 allocations: 39.59 KiB)
``````

Not sure what has changed, perhaps this PR?

Thanks for re-benchmarking. Apparently accepting an `elt` with `S <: T` invokes a lot of type inference unnecessarily, which I now eliminated, closing almost all of the gap between `Base.collect` and my code (see `# IMPORTANT CHANGE HERE` in the code below).

However, since this is for generic collections, if I level the playing field by using `Base.push_widen`, the version below is faster than `Base.collect`.

# Code

(the whole thing repeated, to make it self-contained)

``````using BenchmarkTools

# define an iterator with unknown type and length
struct MyItr N::Int end
Base.IteratorSize(::Type{MyItr}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{MyItr}) = Base.EltypeUnknown()
function Base.iterate(itr::MyItr, state = 1)
if state > itr.N
nothing
else
val = state == 200 ? missing : state
(val, state + 1)
end
end

# IMPORTANT CHANGE HERE
store!_or_widen(collection::Vector{T}, elt::T) where {T} = push!(collection, elt)
store!_or_widen(collection, elt) = Base.push_widen(collection, elt)

function mycollect_funeq(itr)
y = iterate(itr)
y ≡ nothing && error("not allowed")
elt, state = y
_mycollect_funeq([elt], itr, state)
end

function _mycollect_funeq(collection, itr, state)
while true
y = iterate(itr, state)
y ≡ nothing && return collection
elt, state = y
expanded = store!_or_widen(collection, elt)
expanded ≡ collection || return _mycollect_funeq(expanded, itr, state)
end
end

itr = MyItr(1000)

@btime collect(\$itr);
@btime mycollect_funeq(\$itr);
``````

# Benchmarks

``````julia> @btime collect(\$itr);
10.838 μs (15 allocations: 30.89 KiB)

julia> @btime mycollect_funeq(\$itr);
10.343 μs (13 allocations: 30.81 KiB)

julia> VERSION
v"1.2.0-DEV.300"
``````
2 Likes