Hi everyone,

For a project I’m working on I have a datastructure with an `AbstractArray`

interface that internally stores its data in a `Vector`

of `Vectors`

(more precisly, it uses a fixed-size arrays, but that doesn’t matter for the problem at hand). Now I would like to efficiently iterate over these elements as if it were a single array. Of course I could do this:

```
# For example, compute a sum
function sumiterate_pv(pv::PartitionedVector)
s = zero(eltype(pv))
for part in pv.partitions
for x in part
s += x
end
end
s
end
```

But this restricts the function to only accept `PartitionedVector`

types and I’d like something more general, such as:

```
function sumiterate(A::AbstractVector)
s = zero(eltype(A))
for x in A
s+= x
end
s
end
```

Now I know about `Iterators.flatten`

, but that performs a lot worse than the manually written double loop, so I decided to write my own `iterate`

function, in which I try to mimic the double loop behavior. This is already performing a lot better than `Iterators.flatten`

, but computing the sum using this `iterate`

function is still not as performant as the `sumiterate_pv`

example. By looking at the LLVM that is being produced for both versions I can see that `sumiterate_pv`

is compiled into SIMD code, which makes sense for computing a sum. However, this does not happen for loops using my custom `iterate`

function. I think it has something to do with the fact that the compiler doesn’t really see it as a double loop, but as a single loop with a branch in the body, which makes it hard to vectorize. Does anyone have any idea if I can somehow rewrite my `iterate`

function in a way that would compile down to SIMD code? Below is a (simplified) snippet of my code.

```
mutable struct PVIterateState{T}
num_partitions::Int
outer_idx::Int
cur_part::Vector{T}
cur_part_length::Int
inner_idx::Int
end
function Base.iterate(pv::PartitionedVector)
isempty(pv) && return nothing
cur_part = pv.partitions[1]
state = PVIterateState(length(pv.partitions), 1, cur_part, length(cur_part), 2)
(cur_part[1], state)
end
function Base.iterate(pv::PartitionedVector, s::PVIterateState)
if s.inner_idx <= s.cur_part_length # inner loop
@inbounds res = s.cur_part[s.inner_idx]
s.inner_idx += 1
(res, s)
elseif s.outer_idx < s.num_partitions # outer loop
s.outer_idx += 1
@inbounds s.cur_part = pv.partitions[s.outer_idx]
s.cur_part_length = length(s.cur_part)
@inbounds res = s.cur_part[1]
s.inner_idx = 2
(res, s)
else
nothing
end
end
```