`collect` gives different result than `for` loop

MillironX · May 4, 2022, 8:58pm

I’ve somehow created an iterator function where calling collect on the Alignment struct returns a 5-element vector, while iterating produces only three elements.

Context: I’m working on a patch for BioAlignments (https://github.com/MillironX/BioAlignments.jl/tree/operations-fix), where I modified the Base.iterate function, but I don’t know how to debug this since stepping into the iterator doesn’t causes the input to become correct again. Does anyone know how this could happen?

Iterator code: https://github.com/MillironX/BioAlignments.jl/blob/5a8cb6bb3179e00adf7f750e26952257ca70f810/src/pairwise/alignment.jl#L17-L73

MWE:

using BioAlignments

anchors = [
    AlignmentAnchor(0, 0, 0, OP_START),
    AlignmentAnchor(2, 2, 2, OP_SEQ_MATCH),
    AlignmentAnchor(2, 2, 3, OP_PAD),
    AlignmentAnchor(3, 3, 4, OP_SEQ_MATCH),
    AlignmentAnchor(3, 3, 5, OP_HARD_CLIP),
]
seq = AlignedSequence("ACG", anchors)
ref = "ACG"
aln = PairwiseAlignment(seq, ref)

@show collect(aln)

for (k, (i,j)) in enumerate(aln)
    @show k
    @show i
    @show j
    println(' ')
end

Output:

collect(aln) = [('A', 'A'), ('C', 'C'), ('G', 'G'), ('\0', '\0'), ('\0', '\0')]
k = 1
i = 'A'
j = 'A'
 
k = 2
i = 'C'
j = 'C'
 
k = 3
i = 'G'
j = 'G'

mbauman · May 4, 2022, 9:22pm

What is length(aln)? I’m guessing your discrepancy is there.

Sukera · May 4, 2022, 9:27pm

I’ve seen such behavior a few times. It usually happens when length claims one length, but actually iterating produces fewer elements. collect preallocates based on length, but doesn’t shrink to the number of elements actually produced (counting them may be slow).

Would be interesting if we could add a LengthBounded() trait to opt into that allows shrinking the resulting collection if fewer than length(itr) elements were actually produced.

MillironX · May 4, 2022, 9:29pm

julia> length(aln)
5

Topic		Replies	Views
Collect() requires length() for iterators? New to Julia	2	1056	July 29, 2017
Inverting default `collect` behavior when iterator returns an array General Usage iterators	3	318	August 7, 2022
Collecting zip New to Julia	1	5397	February 13, 2019
Functional implementation of collect General Usage question	4	1371	February 9, 2019
Collecting homogenized vectors: modify iterator vs modify collector Performance question , performance , iterative , collection	3	644	September 30, 2020

`collect` gives different result than `for` loop

Related topics