# Custom flatten version is faster than `flatten()`

Consider an array of matrices i.e. `a = [rand(3, 3) for i = 1:1000]`. I have two ways to “flatten” this array:

``````function naive_flatten(a)
ret = Array{Float64}(undef, 0)
@inbounds for i=1:length(a)
append!(ret, vec(a[i]))
end
end

using BenchmarkTools; using Base.Iterators;
@btime aa = naive_flatten(\$a)
## built in function
@btime aa = collect(flatten(\$a))

61.487 μs (2012 allocations: 222.64 KiB)
204.570 μs (18015 allocations: 819.22 KiB)
``````

Why is there such a big difference? My function shouldn’t even do well since I am not preallocating anything…

I think that the problem is the current implementation of `Iterators.flatten` does not take advantage of many informations: first of all it treats arrays in a generic way, as we didn’t know their actual length, in fact if you see

`flatten_iteratorsize` does not specialize on `Union{HasLength, HasShape}` unless the iterator element type is a tuple or a number.

I don’t know if there is a deeper motivation about that.

This is not so much about `flatten_iteratorsize` but about the performance difference between

``````julia> function naive_flatten2(a)
ret = Array{Float64}(undef, 0)
for ai in a
for aij in ai
push!(ret, aij)
end
end
ret
end
naive_flatten2 (generic function with 1 method)

julia> function naive_flatten3(a)
ret = Array{Float64}(undef, 0)
for ai in Iterators.flatten(a)
push!(ret, ai)
end
ret
end
naive_flatten3 (generic function with 1 method)

julia> @btime aa = naive_flatten2(\$a);
100.138 μs (14 allocations: 256.70 KiB)

julia> @btime aa = naive_flatten3(\$a);
239.802 μs (18015 allocations: 819.22 KiB)
``````
2 Likes

Is this something worth opening an issue for?

FWIW, I’d think it is.