Yeah, the story of BioSequences.each
is the sad story of Kmers.jl. The function created a kmer iterator over the given biosequence. About 2 years ago or such, Sabrina and I discussed that we were not too happy with the implementation of kmers in Julia:
- They only supported 2-bit nucleotides
- They supported a max k of 32 (since they were packed into
UInt64
) - They couldn’t be extended with new alphabets
So the repo Kmers.jl (originally NTupleKmers) was born to address all these issues, and was being rapidly developed primarily by Sabrina. At the same time I was rewriting BioSequences to v3. Since the kmer code was a big, unwieldy part of BioSequences v2, and we had already decided to move kmer functionality into its own package (which required the breaking change to BioSequences), we simply dropped kmers from BioSequences v3. I then released v3 fully expecting Kmers.jl to be online soon after with the following message in the changelog:
Removed kmer functionality - this is moved to Kmers.jl
Unfortunately, right at that time, Sabrina withdrew, and so Kmers.jl never got finished. It’s something I’d like to get around to “some day”, but it’s pretty tricky work partly because it involves bitflipping integers in NTuples and there is absolutely zero room for inefficiency - everything must compile to the optimal CPU instructions.
So, lesson learned: Don’t actually rely on volunteers completing work in their free time by a specific date.