[ANN] XAMAuxData.jl - read and write auxiliary fields in the SAM, BAM, GFA and PAF formats

I’m happy to announce the latest BioJulia package XAMAuxData.jl.

This package contains functionality for reading and writing auxiliary data fields in the SAM, BAM, GFA and PAF formats.
This package is low-level, with a focus on performance, robustness and error recovery. It is intended to be used as a library by other packages - specifically packages that work with the aforementioned formats.

What are XAM auxiliary fields

The file formats SAM, BAM, GFA and PAF are a family of file formats designed by bioinformatician Heng Li and others. They are all relatively simple tab-delimited plaintext files, except for BAM which is the binary equivalent of SAM, and they all contain a sequence of records.
These records may have additional user-defined data, which is specific to the program that produces the files. These four formats share the same “mini-format” specification of auxiliary data. Since this format is not entirely trivial to implement, it makes sense that this is done in one central package which others can then rely on.

Plans for using XAMAuxData.jl in the BioJulia ecosystem

XAMAuxData.jl requires Julia 1.11, and is therefore not used in BioJulia’s SAM and BAM parser until Julia 1.10 support is dropped, which may be several years from now. Current version of XAM.jl contains its own parser of auxiliary data, which this package is intended to replace.
We currently have a PAF parser awaiting registration in the general registry, which uses XAMAuxData.
BioJulia currently has no GFA parser, but it is expected that a future GFA parser will use this package.

Limited GFA support as of now

Unfortunately, the GFA specification deviates slightly from the SAM specification, as they are maintained by two different groups of people. As such, the specification for the auxiliary data is almost, but not quite, compatible.
Currently, XAMAuxData does not have a dedicated GFA module, Users may use the SAM module to get a 95% compliant parser.
When we decide to create a GFA parser in the future, XAMAuxData.jl will likely get a new GFA module which shares the majority of code with the SAM module, but which provides actual GFA-compliant auxiliary data type.

7 Likes

Just curious: which feature requires 1.11?

This uses Memory and MemoryRef indirectly, through its dependency MemoryViews.jl.

1 Like