Julia package to calculate coverage and depth from sam/bam files

Hi there,

I am wanting to generate coverage and depth statistic for some sam/bam files I have and was wondering if Julia has a package that can do this for me?

Disclaimer: I have no idea what “sam/bam files” are.

A google search for “sam/bam files jl” led me directly to SAM and BAM · BioAlignments.jl which seems to suggest that BioAlignments.jl might be what you’re looking for :slight_smile:

Alternatively, GitHub - BioJulia/XAM.jl: Parse and process SAM and BAM formatted files might be worth a look.

More generally, you should probably check out https://biojulia.net/.


I think there’s a function that computes coverage from an array of alignements somewhere, but here’s how you can do it from scratch, it’s a little bit complicated but it gives you complete control on how the coverage is computed (you can filter out reads, softclips, etc) :

using BioAlignments, XAM
import XAM.BAM

function get_coverage(bam_file, chr)

    # use dictionnaries to store coverage
    coverage = Dict(chr => Dict{Int,Int}() for chr in chr)

    reader = open(BAM.Reader, bam_file)
    record = BAM.Record()

    while !eof(reader)

        read!(reader, record)
        !BAM.ismapped(record) && continue
        chr = BAM.refname(record)
        aln = BAM.alignment(record)
        # anchor is one base before sequence, so +1
        seqrange = (first(aln.anchors).seqpos + 1):last(aln.anchors).seqpos
        #loop through the sequence coordinates
        for i in seqrange
            # get corresponding position in the reference genome
            refpos, OP = seq2ref(aln, i)

            # count only matches (skip softclip, deletions, ..)
            if OP == OP_MATCH
                coverage[chr][refpos] = get(coverage[chr], refpos, 0) + 1 


bam_file = "..."
chr = vcat(string.(1:22), "X", "Y")

cov = get_coverage(bam_file, chr)

sum(values(cov["20"])) # total coverage on chr. 20

This helps with alignments and anchors : Alignment representation · BioAlignments.jl


Just note that for paired-end reads that might counts alignments twice, but not 100% sure.

Many thanks, I will give this a go and see how it compares to other tools!

1 Like