Loop (i, j) without zip() stopped at the 2nd round

I’m trying to loop two DNA sequences:

a = "AAAGCTAGCTAGCTAGACT"
b = "ATAGCTAGCCAGCTAAACT"

for (i, j) in (a,b)
    println(i, j)
end

It gives a result:

AA
AT

Why does the loop stop at the 2nd iteration?

I know I’d better do it with zip() as :

for (i, j) in zip(a,b)
    println(i, j)
end

and it gave the right answer.

AA
AT
AA
GG
CC
TT
AA
GG
CC
TC
AA
GG
CC
TT
AA
GA
AA
CC
TT

But still, I don’t understand why iteration stopped at the 2nd round without zip(), and it didn’t raise any error! Does anyone have any ideas? I can’t find the answer in the doc (or I didn’t go to the right place)

Julia version:

v"1.7.3"

Interesting approaches:

You’re creating a tuple (a,b), which is iterated in the for loop. So the first iteration will give you just a:

julia> iterate((a,b))
("AAAGCTAGCTAGCTAGACT", 2)

(The 2 is part of the iteration protocol, it’s the state of iteration).

The a is then destructured due to you writing (i,j) in the loop head:

julia> (i,j) = a
"AAAGCTAGCTAGCTAGACT"

julia> i
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> j
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> (i,j) = b
"ATAGCTAGCCAGCTAAACT"

julia> i
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> j
'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)

As the loop over the (a,b) tuple is over, the loop terminates and only prints the first two characters of each string.

The loop you’ve written is equivalent to this one, which may be a bit clearer to understand why & how it works the way it does:

itr = (a,b)
_tmp = iterate(itr)
while _tmp !== nothing
    obj, state = _tmp
    # your loop body/specification starts
    i, j = obj
    println(i, j)
    # your loop body/specification ends
    _tmp = iterate(itr, state)
end

For more information, see this part of the docs about the iteration protocol.

9 Likes

Thank you for your answer.
I’m not sure if I completely understood this:

Do you mean tuple (i, j) first points to the first two letters in a, and then points to the first two letters in b? (It took a while for me to make it clear in my mind :joy:)

I wonder what “state” means here, does it mean the depth or the length of the tuple used for iteration? (e.g. in this case length of (i,j)is 2)

Cause when I create a long tuple, it worked.

a = "AAAGCTAGCTAGCTAGATC"
b = "ATAGCTAGCCAGCTAAACT"
c = "AGAGGATCGCAGCTATTGA"

str = collect(i for i in string.("i", 1:length(a)))

for (str) in (a,b,c)
    println(str)
end

Output:

AAAGCTAGCTAGCTAGATC
ATAGCTAGCCAGCTAAACT
AGAGGATCGCAGCTATTGA

:grinning:

Yes, exactly.

The iteration state is something that is passed to the next time iterate is called - it’s specific to each iterable. In the case of a tuple, it’s just the index of the next element in the tuple.

julia> a = "AAAGCTAGCTAGCTAGACT"
"AAAGCTAGCTAGCTAGACT"

julia> b = "ATAGCTAGCCAGCTAAACT"
"ATAGCTAGCCAGCTAAACT"

julia> tup = (a,b)
("AAAGCTAGCTAGCTAGACT", "ATAGCTAGCCAGCTAAACT")

julia> iterate(tup)
("AAAGCTAGCTAGCTAGACT", 2)

julia> iterate(tup, 2)
("ATAGCTAGCCAGCTAAACT", 3)

julia> iterate(tup, 3) # returns `nothing`, indicating the end of the iteration

The iteration state is arbitrary though, so it can be anything you’d like it to be for your own type.

Yes. You’re literally creating a tuple of three strings by writing (a,b,c). Since loops have their own scope, your loop variable str shadows the existing str variable outside of the loop. So the str in your loop first is a, then b, then c.

Ahh, no wonder the iteration while loop starts with while _tmp !== nothing

Thanks for your explanation.
Julia is so neat.

1 Like

For your example, the following would be even neater:
foreach(println, a, b)

Didn’t know foreach().

Thanks a lot!

zip is probably the best choice in this case, but here is another option:

julia> for i in eachindex(a)
           println(a[i], b[i])
       end
AA
AT
AA
GG
CC
TT
AA
GG
CC
TC
AA
GG
CC
TT
AA
GA
AA
CC
TT

You can read more about tuple destructuring here.

I know it’s not relevant to your question, but if you work with DNA sequences, I recommend checking out BioSequences that contain relatively lightweight types specifically made for biological sequences like DNA.

5 Likes

Yes for sure! THANKS!