[ANN] BioRecordsProcessing.jl - Easily process your biological records

jonathanBieler · October 21, 2021, 1:10pm

This is a small utility package that help reduce the boilerplate when processing files containing biological records (fastqs, bams, vcfs, …). It deals with files management, opening & closing readers and writers, processing files in parallel, etc. In theory one can replicate most of the options in classic tools like samtools, vcftools, seqtk, etc (although it doesn’t really take advantage of indexed files currently).

Here’s an example where records in fasta files are filtered out according to the length of the sequence :

using BioRecordsProcessing, FASTX

BioRecordsProcessing.process_directory(FASTX.FASTA, input_directory, "*.fa", output_directory; max_records=100) do record
    return FASTX.FASTA.seqlen(record) < 50 ? nothing : record
end

I’ve used it a bit myself but I haven’t tested it very thoroughly, so double check that the outputs is correct.

Topic		Replies	Views
[ANN] BioRecordsProcessing.jl - Process your biological records with ease Package Announcements	0	235	June 6, 2023
Interest in RNA-seq specific convenience package based on BioJulia? Biology, Health, and Medicine	16	1399	July 17, 2023
[ANN] Nucleotide_Essentials.jl - Support for some basic first steps in analyzing Illumina sequencing data! Package Announcements package , announcement , biology	3	422	April 15, 2022
Indexing a fasta file with FASTX.jl Biology, Health, and Medicine question , package	1	1086	December 2, 2021
[blogpost] From FASTQ to CNV calls in Julia Community biology , blog-post	0	178	April 17, 2024

[ANN] BioRecordsProcessing.jl - Easily process your biological records

Related topics