When I used Julia’s package called XAM.jl,I was convinced that it was extraordinarily fast.
But when I aligned the reads to the genome and then generated a bam file, how do I restore sequenced reads to undeleted and uninserted reads from the reference genome sequence by the cigar value(like the picture below) in the bam file?Are there any functions in XAM.jl?If I input (cigar,reads,ref),how can i return the the sequence with no deletions and no insertions?
Thanks.But my problem is that I have the BAM file now, and I want to change the sequence of reads into reads without deletion and insertions by reference and cigar instead of using BioAlignment package to align the reads
BAM.alignment(record) returns a BioAlignments.Alignment struct. The printed output that you see is generated by BioAlignments’ show method. It is true that this struct is essentially a representation of the cigar string. But as @jonathanBieler suggested, the representation can be used with PairwiseAlignment to map to both sequence and reference positions.
# Setup example.
cigar = "5M1D5M2I5M1D3M" # cigar = BAM.cigar(record).
seq = dna"ACAAACCCCCCCTTTTTGGG" # seq = BAM.sequence(record).
ref = dna"AAAAATCCCCCTTTTTGGGG" # This is the reference sequence used when generating the alignment. This should be available, but if not maybe explore https://hgdownload.soe.ucsc.edu/goldenPath/).
target = dna"ACAAATCCCCCTTTTTGGGG"
# Retrieve alignment.
aln = Alignment(cigar) # aln = BAM.alignment(record)
# Construct pairwise alignment.
paln = PairwiseAlignment(AlignedSequence(seq, aln.anchors), ref)
# Process alignments.
itr = Base.Generator(paln) do a
# Unpack a.
(sym_seq, sym_ref) = a
# Implement logic...
if (sym_seq == DNA_Gap)
if (sym_ref != DNA_Gap)
# Filter out nothings.
itr = Iterators.filter(!isnothing, itr)
# Collect result.
result = LongDNASeq(collect(itr))
# Compare the result with the target sequence.
result == target