Hi everyone,
For a project I’m working on I have a datastructure with an AbstractArray
interface that internally stores its data in a Vector
of Vectors
(more precisly, it uses a fixed-size arrays, but that doesn’t matter for the problem at hand). Now I would like to efficiently iterate over these elements as if it were a single array. Of course I could do this:
# For example, compute a sum
function sumiterate_pv(pv::PartitionedVector)
s = zero(eltype(pv))
for part in pv.partitions
for x in part
s += x
end
end
s
end
But this restricts the function to only accept PartitionedVector
types and I’d like something more general, such as:
function sumiterate(A::AbstractVector)
s = zero(eltype(A))
for x in A
s+= x
end
s
end
Now I know about Iterators.flatten
, but that performs a lot worse than the manually written double loop, so I decided to write my own iterate
function, in which I try to mimic the double loop behavior. This is already performing a lot better than Iterators.flatten
, but computing the sum using this iterate
function is still not as performant as the sumiterate_pv
example. By looking at the LLVM that is being produced for both versions I can see that sumiterate_pv
is compiled into SIMD code, which makes sense for computing a sum. However, this does not happen for loops using my custom iterate
function. I think it has something to do with the fact that the compiler doesn’t really see it as a double loop, but as a single loop with a branch in the body, which makes it hard to vectorize. Does anyone have any idea if I can somehow rewrite my iterate
function in a way that would compile down to SIMD code? Below is a (simplified) snippet of my code.
mutable struct PVIterateState{T}
num_partitions::Int
outer_idx::Int
cur_part::Vector{T}
cur_part_length::Int
inner_idx::Int
end
function Base.iterate(pv::PartitionedVector)
isempty(pv) && return nothing
cur_part = pv.partitions[1]
state = PVIterateState(length(pv.partitions), 1, cur_part, length(cur_part), 2)
(cur_part[1], state)
end
function Base.iterate(pv::PartitionedVector, s::PVIterateState)
if s.inner_idx <= s.cur_part_length # inner loop
@inbounds res = s.cur_part[s.inner_idx]
s.inner_idx += 1
(res, s)
elseif s.outer_idx < s.num_partitions # outer loop
s.outer_idx += 1
@inbounds s.cur_part = pv.partitions[s.outer_idx]
s.cur_part_length = length(s.cur_part)
@inbounds res = s.cur_part[1]
s.inner_idx = 2
(res, s)
else
nothing
end
end