ATG
at start looks for START codon, then (?: )*?
repeats whatever is inside 0 or more times, but lazily (that’s the ?
after *
). This skips codons until the first (b/c lazy) of the next group. The next group is 3 string options covering the end codons. The [AGTC]
repeating three times matches a codon, as each square bracket matches a single letter from inside it.
So actually, it could be even simpler:
r"ATG(?:[AGTC]{3})*?T(AG|AA|GA)"
This link gives all the common and uncommon regular expression syntax:
https://www.pcre.org/current/doc/html/pcre2syntax.html