Exact the sequence of chr1 in XAM.jl

I only want to exact the sequence of chr1 not all.

using XAM
bamfile="..."
reader=open(BAM.Reader,bamfile)
for record in reader
    println(BAM.refname(record))
end

This returns all the chromosomes.I just want chr1.But I dont know how to do.
I tried eachoverlap,but the function need start(1) and end(10000).

reader=open(BAM.Reader,bamfile,index=bamfile*".bai")
for record in eachoverlap(reader,"chr1",1:10000)
    println(BAM.refname(record))
end

and i also tried

reader.refseqlens
25-element Vector{Int64}:
 248956422
 133797422
 135086622
 133275309
 114364328
 107043718
 101991189
  90338345
  83257441
  80373285
  58617616
 242193529
  64444167
  46709983
  50818468
 198295559
 190214555
 181538259
 170805979
 159345973
 145138636
 138394717
     16569
 156040895
  57227415
for record in eachoverlap(reader,"chr1",1:248956422)
    println(BAM.refname(record))
end

It can return all of the chr1. I can only make it using eachoverlap(slow?).Are there any ways to make it faster? Thanks very much.

The eachoverlap method currently provides the best way to retrieve records.

You may want to use the recorded reference sequence length to set the end/right-hand position in the query.

reader = open(BAM.Reader, file_bam, index=file_bai)

chrom = "chr1"
idx = findfirst(isequal(chrom), reader.refseqnames)
refseqlen = reader.refseqlens[idx]

itr = eachoverlap(reader, Interval(chrom, 1, refseqlen)) # Equivalently eachoverlap(reader, chrom, 1:refseqlen)
records = collect(itr)

If working with a SAM file, you can find the reference sequence length in the header metadata.

idx = findfirst(reader.header.metainfo) do m
    SAM.tag(m) == "SQ" && m["SN"] == chrom
end
refseqlen = parse(Int, reader.header.metainfo[idx]["LN"])

I admit it’s worth adding short-hand eachoverlap(reader::Reader, refname::String) methods to XAM.

1 Like