[ANN] BioRecordsProcessing.jl - Easily process your biological records

This is a small utility package that help reduce the boilerplate when processing files containing biological records (fastqs, bams, vcfs, …). It deals with files management, opening & closing readers and writers, processing files in parallel, etc. In theory one can replicate most of the options in classic tools like samtools, vcftools, seqtk, etc (although it doesn’t really take advantage of indexed files currently).

Here’s an example where records in fasta files are filtered out according to the length of the sequence :

using BioRecordsProcessing, FASTX

BioRecordsProcessing.process_directory(FASTX.FASTA, input_directory, "*.fa", output_directory; max_records=100) do record
    return FASTX.FASTA.seqlen(record) < 50 ? nothing : record
end

I’ve used it a bit myself but I haven’t tested it very thoroughly, so double check that the outputs is correct.

2 Likes