[ANN] BioRecordsProcessing.jl - Process your biological records with ease

BioRecordsProcessing.jl aims at processing files containing biological records using a minimum amount of boilerplate. It can be used in place of tools like samtools, vcftools, sektq, etc.

Features :

  • Read from disk, modify/filter the records (with a user defined function) and write back to disk
  • Read or write to memory
  • Process whole directories in parallel
  • Handle paired files
  • Handle compressed files
  • Supports VCFs, FASTA/Q, S/BAM (and possibly any Record type that uses the bio record “interface”)

Example :

using BioRecordsProcessing, FASTX, BioSequences

p = Pipeline(
    Reader(FASTX.FASTA, File(filepath)),
    record -> begin
        sequence(LongDNA{4}, record)
    end,
    Collect(LongDNA{4}),
)
run(p)

# output
2-element Vector{LongSequence{DNAAlphabet{4}}}:
 CTTGGCATACTCAAACTCTT
 CTTGGCATACTCAAACTCTT

Missing features :

I think it would be useful to be able to provide a genomic interval to read from (specially for BAM files) and also to group records based on some user-defined criteria (e.g. read names for pair-end BAM files).

4 Likes