Hi all,
I’m working in Bioinformatics research and started using Julia 2 years ago and now use it as my main language for everything I do. I organize the reusable part of my code in form of a Julia package which I try to keep rather general, but in its current form its still basically my personal toolbox. My question would be if there is any interest in this in the community. Then I would try to make a more generally usable and fully documented and tested package out of it.
So, what is it? I call it RNASeqTools (github) and it builds a convenience layer on top of BioJulia specifically for RNA-seq data. It provides data structures and constructors and some commonly used (by me) functions to conveniently read and then manipulate and/or combine:
- read files (.fastq, .fastq.gz, .fasta, fasta.gz)
- genome files (.fa)
- annotation files (.gff)
- alignment files (.bam)
- coverage files (.bw)
- counts files (.csv)
- folders of files of the above types
Some functionality provided by the package:
- convenient iterators for FASTX and XAM records
- efficient annotation of alignments with IntervalTrees
- handling of chimeric reads alignments
- computation of coverage from alignments
- simple signal detection in coverage
- computation of feature counts from coverage or alignments
- simple differential gene expression with feature counts using GLMs
- motif generation and search in genomes with BioSequences
- sample demultiplexing
- some wrappers for external tools (bwa-mem2, minimap2, samtools, fastp)
I know this is pretty bloated but I could see parts of all of this being usefull to others (maybe split into parts?), so I would be happy about any suggestions on what a useful package could look like and would contain.
Thanks already