# `collect` gives different result than `for` loop

I’ve somehow created an iterator function where calling `collect` on the `Alignment` struct returns a 5-element vector, while iterating produces only three elements.

Context: I’m working on a patch for BioAlignments (GitHub - MillironX/BioAlignments.jl at operations-fix), where I modified the `Base.iterate` function, but I don’t know how to debug this since stepping into the iterator doesn’t causes the input to become correct again. Does anyone know how this could happen?

MWE:

``````using BioAlignments

anchors = [
AlignmentAnchor(0, 0, 0, OP_START),
AlignmentAnchor(2, 2, 2, OP_SEQ_MATCH),
AlignmentAnchor(3, 3, 4, OP_SEQ_MATCH),
AlignmentAnchor(3, 3, 5, OP_HARD_CLIP),
]
seq = AlignedSequence("ACG", anchors)
ref = "ACG"
aln = PairwiseAlignment(seq, ref)

@show collect(aln)

for (k, (i,j)) in enumerate(aln)
@show k
@show i
@show j
println(' ')
end
``````

Output:

``````collect(aln) = [('A', 'A'), ('C', 'C'), ('G', 'G'), ('\0', '\0'), ('\0', '\0')]
k = 1
i = 'A'
j = 'A'

k = 2
i = 'C'
j = 'C'

k = 3
i = 'G'
j = 'G'
``````

What is `length(aln)`? I’m guessing your discrepancy is there.

2 Likes

I’ve seen such behavior a few times. It usually happens when `length` claims one length, but actually iterating produces fewer elements. `collect` preallocates based on `length`, but doesn’t shrink to the number of elements actually produced (counting them may be slow).

Would be interesting if we could add a `LengthBounded()` trait to opt into that allows shrinking the resulting collection if fewer than `length(itr)` elements were actually produced.

2 Likes
``````julia> length(aln)
5
``````