I am pleased to announce the initial release of BioToolkit.jl, a bioinformatics toolkit for Julia. It provides a unified, Biopython-inspired API for sequence analysis, structural biology, phylogenetics, and population genetics.
While the Julia ecosystem has excellent modular packages, users often have to work with multiple dependencies (BioSequences, XAM, etc.). BioToolkit aims to provide a complete experience with a focus on performance and ease of use, aiming to be a drop-in solution for common bioinformatics pipelines. This package was originally intended to be a replacement of biopython but it also contains some features from R’s biocomputing library (Bioconductor) as well. The main advantage over R’s biocomputing library is use of multi-threading and CUDA which can be used for certain functions with O(N^2) complexity, this should make BioToolkit.jl faster however that has not been tested yet.
Performance Comparison
It features original implementations optimized for speed. Preliminary benchmarks show significant improvements over Python baselines and competitive performance within the Julia ecosystem:
- Hamming Distance: Upto 328× faster than Biopython.
- BED Parsing: 5× faster than existing Julia parsers.
- Phylogenetics (NJ Tree): 168× faster than Biopython.
- GPU Acceleration: Native CUDA support for k-mer counting and coverage histograms.
(More benchmarks are available in the repository along with code used to benchmark it against biopython).
Feature Highlights
It has API for:
- Sequence: Transcription, translation, codon usage (CAI), k-mer analysis.
- Structure: PDB/mmCIF parsing, superposition (Kabsch), SASA, contact maps.
- Omics: Differential expression, single-cell workflows, GWAS, epigenomics.
- I/O: Readers for FASTA, FASTQ, BAM, BED, GFF, VCF, and GenBank.
and many more.
Installation
The package is currently unregistered but can be installed via:
julia
using Pkg
Pkg.add(url="https://github.com/Aditya747S/BioToolkit.jl.git")
Links
As this is an initial development release (v0.1.0-dev), there can be rough edges, all suggestions and criticisms are welcome.