Consider an array of matrices i.e. a = [rand(3, 3) for i = 1:1000]. I have two ways to “flatten” this array:
function naive_flatten(a)
ret = Array{Float64}(undef, 0)
@inbounds for i=1:length(a)
append!(ret, vec(a[i]))
end
end
using BenchmarkTools; using Base.Iterators;
@btime aa = naive_flatten($a)
## built in function
@btime aa = collect(flatten($a))
61.487 μs (2012 allocations: 222.64 KiB)
204.570 μs (18015 allocations: 819.22 KiB)
Why is there such a big difference? My function shouldn’t even do well since I am not preallocating anything…
I think that the problem is the current implementation of Iterators.flatten does not take advantage of many informations: first of all it treats arrays in a generic way, as we didn’t know their actual length, in fact if you see
This is not so much about flatten_iteratorsize but about the performance difference between
julia> function naive_flatten2(a)
ret = Array{Float64}(undef, 0)
for ai in a
for aij in ai
push!(ret, aij)
end
end
ret
end
naive_flatten2 (generic function with 1 method)
julia> function naive_flatten3(a)
ret = Array{Float64}(undef, 0)
for ai in Iterators.flatten(a)
push!(ret, ai)
end
ret
end
naive_flatten3 (generic function with 1 method)
julia> @btime aa = naive_flatten2($a);
100.138 μs (14 allocations: 256.70 KiB)
julia> @btime aa = naive_flatten3($a);
239.802 μs (18015 allocations: 819.22 KiB)