Missing functions in namespace from BioSequences

I feel like I’m missing something fundamental, but I’m unable to follow many of the examples in the BioSequences documentation, because many of the functions they use remain undefined in my namespace. A few key ones that are frustrating me:

BioSequences.each() - I receive an UndefVarError.

I also receive an UndefVarError for DNAMer, DNAKmer, BigMer, etc.

Finally, if I try to construct DNA sequences from strings:

LongDNASeq(“String”): UndefVarError
LongDNA(“String”): No method matching LongDNA::String
LongSequence{DNAAlphabet{2}}(“String”): Works, which is nice, but I don’t understand why LongDNASeq isn’t working as shown in the documentation.

I’m still new to Julia, so I could be getting my types vs functions vs methods muddled up, but I’m not sure why I can’t access these functions, especially BioSequences.each().

I’ve tried removing and re-adding BioSequences v3.1.3 from the registry, from the current master branch, and all releases down to 2.0.6, with the same results. I’m using Julia v1.8.5 on Mac Ventura 13.3.1

Has anyone else encountered this?
Here are the names exported from BioSequences for me:

[Symbol("@aa_str"), Symbol("@biore_str"), Symbol("@dna_str"), Symbol("@prosite_str"), Symbol("@rna_str"), :AASeq, :AA_A, :AA_B, :AA_C, :AA_D, :AA_E, :AA_F, :AA_G, :AA_Gap, :AA_H, :AA_I, :AA_J, :AA_K, :AA_L, :AA_M, :AA_N, :AA_O, :AA_P, :AA_Q, :AA_R, :AA_S, :AA_T, :AA_Term, :AA_U, :AA_V, :AA_W, :AA_X, :AA_Y, :AA_Z, :ACGT, :ACGTN, :ACGU, :ACGUN, :Alphabet, :AminoAcid, :AminoAcidAlphabet, :ApproximateSearchQuery, :BioRegex, :BioRegexMatch, :BioSequence, :BioSequences, :DNA, :DNAAlphabet, :DNA_A, :DNA_B, :DNA_C, :DNA_D, :DNA_G, :DNA_Gap, :DNA_H, :DNA_K, :DNA_M, :DNA_N, :DNA_R, :DNA_S, :DNA_T, :DNA_V, :DNA_W, :DNA_Y, :ExactSearchQuery, :LongAA, :LongDNA, :LongNuc, :LongRNA, :LongSequence, :LongSubSeq, :NucSeq, :NucleicAcid, :NucleicAcidAlphabet, :PFM, :PWM, :PWMSearchQuery, :RNA, :RNAAlphabet, :RNA_A, :RNA_B, :RNA_C, :RNA_D, :RNA_G, :RNA_Gap, :RNA_H, :RNA_K, :RNA_M, :RNA_N, :RNA_R, :RNA_S, :RNA_U, :RNA_V, :RNA_W, :RNA_Y, :SamplerUniform, :SamplerWeighted, :alphabet, :canonical, :canonical!, :captured, :complement, :complement!, :gap, :gc_content, :hasambiguity, :isGC, :isambiguous, :iscanonical, :iscertain, :iscompatible, :isgap, :ispalindromic, :ispurine, :ispyrimidine, :isrepetitive, :join!, :majorityvote, :matched, :matches, :maxscore, :mismatches, :n_ambiguous, :n_certain, :n_gaps, :ncbi_trans_table, :randaaseq, :randdnaseq, :randrnaseq, :randseq, :reverse_complement, :reverse_complement!, :scoreat, :seqmatrix, :symbols, :translate, :translate!, :ungap, :ungap!]


It appears the documentation maybe lagging behind the changes in some circumstances. It’s also possible that you have loaded an old version of the package or are looking at old documentation.

I note the following in the change log.

LongDNASeq is now just LongDNA.

The version number of the package you are using may be useful. For example, I am using BioSequences version 3.1.3 below.

julia> using Pkg

julia> pkg"status"
Status `/tmp/jl_se5Mm4/Project.toml`
  [7e6ae17a] BioSequences v3.1.3

It might be useful to also indicate what documentation you are looking at. I cannot find a reference to LongDNASeq other than in the changelog above.

Here is my attempt to get your examples to work based on Constructing sequences · BioSequences.jl .

julia> LongDNA{4}("TTANC")
5nt DNA Sequence:

julia> LongSequence{DNAAlphabet{2}}("TTAGC")
5nt DNA Sequence:

julia> LongRNA{4}("UUANC")
5nt RNA Sequence:

Wow, this is so helpful!

I’ve confirmed I’m working with BioSequences v3.1.3.

Found my big mistake - I’ve been working with documentation from:
so v2.0.4! This makes a lot more sense. LongDNA is working for me - I wasn’t supplying the bit number before.

I notice that the stable documentation is pretty scarce on the iteration front:
Have those capabilities moved to a different package?

Thanks again!

Hi Gus, welcome to the Julia community! And thanks for reporting issues.

Bioinformatics in Julia is is a particularly precarious place right now due to a confluence of (1) lots of innovation and iteration (yay!) (2) a small number of key developers working somewhat in isolation on different parts of the ecosystem (boo!). Most of those key developers are also scientists trying to get other work done, so time is a major constraint, and as you might imagine, documentation is often the first thing to suffer.

Which sucks for sure, and is a real bummer for newcomers to the language especially. Definitely don’t hesitate to reach out here or on slack (#biology channel is watched by most of us). And also don’t hesitate to open issues about stuff where docs are lacking. One piece of good news is that we might be getting a GSoC student :crossed_fingers: who’s going to revamp the website and do some tutorials and other docs improvements, which we sorely need.

Regarding your specific question about iteration, I recall some changes to the API there, possibly related to Kmers.jl, @jakobnissen might know more?

Yeah, the story of BioSequences.each is the sad story of Kmers.jl. The function created a kmer iterator over the given biosequence. About 2 years ago or such, Sabrina and I discussed that we were not too happy with the implementation of kmers in Julia:

  • They only supported 2-bit nucleotides
  • They supported a max k of 32 (since they were packed into UInt64)
  • They couldn’t be extended with new alphabets

So the repo Kmers.jl (originally NTupleKmers) was born to address all these issues, and was being rapidly developed primarily by Sabrina. At the same time I was rewriting BioSequences to v3. Since the kmer code was a big, unwieldy part of BioSequences v2, and we had already decided to move kmer functionality into its own package (which required the breaking change to BioSequences), we simply dropped kmers from BioSequences v3. I then released v3 fully expecting Kmers.jl to be online soon after with the following message in the changelog:

Removed kmer functionality - this is moved to Kmers.jl

Unfortunately, right at that time, Sabrina withdrew, and so Kmers.jl never got finished. It’s something I’d like to get around to “some day”, but it’s pretty tricky work partly because it involves bitflipping integers in NTuples and there is absolutely zero room for inefficiency - everything must compile to the optimal CPU instructions.

So, lesson learned: Don’t actually rely on volunteers completing work in their free time by a specific date.