I am trying to use Iterators.product
to generate a list of all possible k-mers with a set of alphabets. Since I am a biologist, these letters happen to be the bases of DNA : “ATGC”
If I want to generate 3-mers then this works perfectly:
nucs = "ATGC"
join.(Iterators.product(nucs,nucs,nucs))
4×4×4 Array{String, 3}:
[:, :, 1] =
"AAA" "ATA" "AGA" "ACA"
"TAA" "TTA" "TGA" "TCA"
"GAA" "GTA" "GGA" "GCA"
"CAA" "CTA" "CGA" "CCA"
[:, :, 2] =
"AAT" "ATT" "AGT" "ACT"
"TAT" "TTT" "TGT" "TCT"
"GAT" "GTT" "GGT" "GCT"
"CAT" "CTT" "CGT" "CCT"
[:, :, 3] =
"AAG" "ATG" "AGG" "ACG"
"TAG" "TTG" "TGG" "TCG"
"GAG" "GTG" "GGG" "GCG"
"CAG" "CTG" "CGG" "CCG"
[:, :, 4] =
"AAC" "ATC" "AGC" "ACC"
"TAC" "TTC" "TGC" "TCC"
"GAC" "GTC" "GGC" "GCC"
"CAC" "CTC" "CGC" "CCC"
However, if I want to extend the same method for longer words, I have to keep repeating “nucs” as the arguments for Iterators.product
. What I would like is a smart way where I specify the word size and I generate all possible words of that size. So far, I tried this:
join.(Iterators.product(Iterators.repeated(nucs,3)))
3-element Vector{String}:
"ATGC"
"ATGC"
"ATGC"
I tried with fill
too:
join.(Iterators.product(fill(nucs,1,3)))
join.(Iterators.product(fill(nucs,3)))
but I get the same output.
Using collect(nucs)
instead of nucs
also doesn’t give me the desired output
Any solutions?