[ANN] BioRecordsProcessing.jl - Process your biological records with ease

jonathanBieler · June 6, 2023, 9:36am

BioRecordsProcessing.jl aims at processing files containing biological records using a minimum amount of boilerplate. It can be used in place of tools like samtools, vcftools, sektq, etc.

Features :

Read from disk, modify/filter the records (with a user defined function) and write back to disk
Read or write to memory
Process whole directories in parallel
Handle paired files
Handle compressed files
Supports VCFs, FASTA/Q, S/BAM (and possibly any Record type that uses the bio record “interface”)

Example :

using BioRecordsProcessing, FASTX, BioSequences

p = Pipeline(
    Reader(FASTX.FASTA, File(filepath)),
    record -> begin
        sequence(LongDNA{4}, record)
    end,
    Collect(LongDNA{4}),
)
run(p)

# output
2-element Vector{LongSequence{DNAAlphabet{4}}}:
 CTTGGCATACTCAAACTCTT
 CTTGGCATACTCAAACTCTT

Missing features :

I think it would be useful to be able to provide a genomic interval to read from (specially for BAM files) and also to group records based on some user-defined criteria (e.g. read names for pair-end BAM files).

Topic		Replies	Views
[ANN] BioRecordsProcessing.jl - Easily process your biological records Package Announcements biology	0	353	October 21, 2021
Interest in RNA-seq specific convenience package based on BioJulia? Biology, Health, and Medicine	16	1399	July 17, 2023
Change record sequence in a BAM file using XAM Biology, Health, and Medicine	2	481	June 4, 2023
[ANN] Nucleotide_Essentials.jl - Support for some basic first steps in analyzing Illumina sequencing data! Package Announcements package , announcement , biology	3	422	April 15, 2022
[blogpost] From FASTQ to CNV calls in Julia Community biology , blog-post	0	178	April 17, 2024

[ANN] BioRecordsProcessing.jl - Process your biological records with ease

Features :

Example :

Missing features :

Related topics