Hi there,
I am wanting to generate coverage and depth statistic for some sam/bam files I have and was wondering if Julia has a package that can do this for me?
Hi there,
I am wanting to generate coverage and depth statistic for some sam/bam files I have and was wondering if Julia has a package that can do this for me?
Disclaimer: I have no idea what “sam/bam files” are.
A google search for “sam/bam files jl” led me directly to https://biojulia.net/BioAlignments.jl/latest/hts-files.html which seems to suggest that BioAlignments.jl might be what you’re looking for
Alternatively, https://github.com/BioJulia/XAM.jl might be worth a look.
More generally, you should probably check out https://biojulia.net/.
I think there’s a function that computes coverage from an array of alignements somewhere, but here’s how you can do it from scratch, it’s a little bit complicated but it gives you complete control on how the coverage is computed (you can filter out reads, softclips, etc) :
using BioAlignments, XAM
import XAM.BAM
function get_coverage(bam_file, chr)
# use dictionnaries to store coverage
coverage = Dict(chr => Dict{Int,Int}() for chr in chr)
reader = open(BAM.Reader, bam_file)
record = BAM.Record()
while !eof(reader)
read!(reader, record)
!BAM.ismapped(record) && continue
chr = BAM.refname(record)
aln = BAM.alignment(record)
# anchor is one base before sequence, so +1
seqrange = (first(aln.anchors).seqpos + 1):last(aln.anchors).seqpos
#loop through the sequence coordinates
for i in seqrange
# get corresponding position in the reference genome
refpos, OP = seq2ref(aln, i)
# count only matches (skip softclip, deletions, ..)
if OP == OP_MATCH
coverage[chr][refpos] = get(coverage[chr], refpos, 0) + 1
end
end
end
close(reader)
coverage
end
bam_file = "..."
chr = vcat(string.(1:22), "X", "Y")
cov = get_coverage(bam_file, chr)
sum(values(cov["20"])) # total coverage on chr. 20
This helps with alignments and anchors : https://biojulia.net/BioAlignments.jl/latest/alignments.html#Overview-1
Just note that for paired-end reads that might counts alignments twice, but not 100% sure.
Many thanks, I will give this a go and see how it compares to other tools!