I have two independant vectors, which change in size over time, but sometimes I need to iterate over them both. I tried vcat, Iterators.flatten and CatViews and vcat needs the least amount of runtime and memory, quite the opposite of what I was expecting.
using BenchmarkTools, CatViews
function mysum(x)
s = zero(eltype(x))
for e in x
s += e # only a dummy operation
end
s
end
a = rand(1_000);
b = rand(1_000);
@btime vcatvec = vcat(a, b);
1.809 μs (1 allocation: 15.75 KiB)
@btime mysum(vcatvec)
2.093 μs (1 allocation: 16 bytes)
@btime flatted = Base.Iterators.flatten((a, b));
268.347 ns (2 allocations: 48 bytes)
@btime mysum(flatted)
30.454 μs (4001 allocations: 125.02 KiB)
@btime catview = CatView(a, b);
492.593 ns (7 allocations: 256 bytes)
@btime mysum(catview)
911.768 μs (12989 allocations: 765.66 KiB)
I used Julia 1.0.0 and triggered the compilation bevor all measurments.
Is there a faster way for iterating over two vectors?
Yes,
computers don’t work the way we like to think they do.
Cache-friendlyness and Branch-prediction friendlness matters an immense amount.
when you do a vcat you pay the one off price for the coping of everything to contagious memory,
but then you get to be doing an operation that the CPU is really well optimized for.
(This is a hardware thing, not a julia thing)
julia> function broadcast_sum(x,y)
output = zero(eltype(x))
for i in eachindex(x)
output += x[i]+y[i]
end
output
end
broadcast_sum (generic function with 1 method)
julia> @btime broadcast_sum(a,b)
1.007 μs (1 allocation: 16 bytes)
julia> @btime broadcast_sum(a,b)
996.286 ns (1 allocation: 16 bytes)
I don’t understand a couple of implementation details in CatViews:
How should getindex be fast at all? It’s doesn’t properly propagate bounds checking and a for-loop is called for each access. Using the tuple getindex accessor makes it faster, but its still type unstable.
As already noticed it isn’t typestable - maybe this came with the 1.0 update, where a lot of type params declarations were removed.
The perf.jl file doesn’t actually contain usable comparison benchmarks.
What’s the purpose of the inner field in CatView? It’s never used?
I’d stay away from CatViews until it has proven that the original benefits are still valid in Julia 1.0.
Maybe @rdeits can chime in on performance problems with CatView, as he upgrade CatViews for Julia 1.0.
For anyone who isn’t aware,
it is worth noting CatViews is quiet old.
It was designed for Julia 0.3 IIRC.
I think it’s type stability issues actually came from the 0.3->0.4 transition.
(or maybe 0.4->0.5)
RecurisveArrayTools.jl is a good alternative, IIRC.
But I’ve not benchmarked it on this.
Imo the really annoying thing is that the flattened iterator allocates during iteration, which makes mysum(flattened) so slow. Looking at @code_warntype, the bad object is of type ::Union{Nothing, Tuple{Float64,Tuple{Int64,Array{Float64,1},Int64}}}.
In the case of flattening over a tuple of iterators or an array of iterators (anything that can be indexed with Int), there is no good reason for having an iterator-state that is not bitstype (it should be a tuple of integers, such that the next element is iterate(flattened_iterators[i], j)). So I’d say that this is either a problem of Base.Iterators.flatten or yet another manifestation of the problem with heap-allocating small immutable non-bitstypes.