Here I introduce the new packages LibDeflate.jl and CodecBGZF.jl. Both packages are newly registered, and I would love to get comments.
LibDeflate.jl
LibDeflate.jl is a thin wrapper around libdeflate, the fastest implementation of the DEFLATE compression algorithm that I’m aware of. DEFLATE is used in the zip, gzip and bgzip formats. As a wrapper library with a minimal interface, LibDeflate.jl is intended as a low-level building block for writing higher-level Julia packages. The package also offers a very fast implementation of the crc32 checksum.
LibDeflate.jl differs from the more common DEFLATE-related package CodecZlib in the following ways:
- LibDeflate.jl is multiple times faster than CodecZlib.jl.
- LibDeflate.jl only supports in-memory inflation/deflation, and as such does not support streaming IO unless the stream is composed of smaller compressed blocks, each which can be de/compressed in memory.
- LibDeflate.jl’s interface is lower-level and does not provide convenience methods.
CodecBGZF.jl
CodecBGZF.jl is a higher-level package built on LibDeflate.jl with the purpose of reading and writing blocked gzip format (BGZF) files. It is a codec for TranscodingStreams.jl, and therefore integrates well with the Julia file IO ecosystem.
BGZF is a format backwards compatible with gzip. It offer slightly worse compression ratios than normal gzip files, but offers random access and is significantly faster to de/compress.
Compared to the existing BGZFStreams.jl package, CodecBGZF offers:
- Leverages LibDeflate.jl for 4-5x faster single threaded read/write operations
- Automatic asyncronous and muiltithreaded de/compression which can make it tens of times faster when using 8 threads
- Being a codec of TranscodingStreams.jl, CodecBGZF automatically “inherits” all the useful high-level methods of TranscodingStreams
- CodecBGZF offers more input validation such as crc32 checksumming and validation of stated decompressed size.