I’m pleased to announce a new package BufIO.jl.
Overview of BufIO.jl
BufIO provides new and improved I/O interfaces for Julia inspired by Rust, and designed around exposing buffers to users in order to explicitly copy bytes to and from them. Compared to the Base.IO
interface, the new interfaces in this package are:
- Lower level
- Faster
- Easier to reason about
- Better specified, with more well-defined semantics
- Free from slow fallback methods that silently trash your performance
Beside the new interfaces, BufIO also provides a small set of basic types to make use of the new interface, and/or allow easy interoperation between Base.IO
types and the new buffered interfaces.
The new types include:
BufReader <: AbstractBufReader
: A type that wraps aBase.IO
to provide the newAbstractBufReader
interfaceBufWriter <: AbstractBufWriter
: A type that wraps aBase.IO
to provide the newAbstractBufWriter
interfaceCursorReader <: AbstractBufReader
: Wrap any contiguous memory-backed bytes in a stateful readerIOReader <: Base.IO
: A type that wraps anAbstractBufReader
and provides theBase.IO
interfaceVecWriter <: AbstractBufWriter
: A faster and simpler alternative toIOBuffer
usable e.g. to build strings.
Comparison to other packages
The packages BufferedStreams.jl and TranscodingStreams.jl also provide IO wrapper types that buffer their wrapped io. However, both these packages do so as a transparent optimisation, whereas BufIO.jl provides a different interface.
History of BufIO.jl
I’ve been writing a bunch of different parsers in Julia for about five years. Each time, I’ve found it necessary to create a reader type containing a buffer, then read into the buffer, and then do all my actual parsing on the buffer. From what I hear from others, that’s also how they do it. Somehow, after years of doing that, I didn’t put two and two together about what that implied about the lack of performance and convenience Julia IO interface.
A few years ago, I learned Rust. One of the most common Rust performance footguns is doing IO operations on unbuffered data (unlike Julia, Rust does not buffer several of its basic IO types).
On the other hand, Rust has the BufRead
trait, which I’ve found extremely well designed and usable. That led me to believe that an IO interface should be buffered by default, and center its API around the buffer.
The penny finally dropped about a year ago, after a few packages forced me to attempt to handle generic Base.IO
objects in Julia, which I found to be a miserable experience due to… many, many issues of the poorly designed Base.IO
. I tried to push a (backwards compatible) extension to Base’s IO interface, but it’s difficult to make sweeping changes to Base without close collaboration with someone with commit rights. So:
Jokes aside, BufIO.jl is my vision for an alternative I/O interface in Julia, which I previously published here. Ideally, the interface should make it to Base, but I’m skeptical that will ever happen. Whether it does or not, it’s useful to prototype the interface in a package to at least be more informed about what a different I/O interface would feel like.
BufIO.jl is not yet in registration, but will be registered soon. I’ve used this interface already in other packages and found it quite expressive and fast.
I’m very much interested in feedback and suggestions for the interface! Please open issues on the repo GitHub - BioJulia/BufIO.jl: Interface for efficient IO in Julia
Notes on the design of the new interface
-
BufIO has distinct
AbstractBufReader
andAbstractBufWriter
types instead of Base’s commonIO
type. I’m not 100% sure which is better, but I’ve found that most use of IO uses either writing or reading, not both, and I’ve found that separating the two interfaces makes implementations cleaner and less bloated. It’s also very easy to work around and create a reader/writer typeT
by e.g. creating a functionwriter(::T)::AbstractBufWriter
. -
The types in BufIO is by default not threadsafe. Locks are slow, and most programs are single threaded. If users want to protect their resources when used concurrently, they’ll need to use a lock themselves. This tradeoff is no different from any other datastructure, such as a
Dict
which we (correctly) understand in Julia should also not be threadsafe by default. -
The current BufIO performance is limited by a few limitiations of Base Julia:
- BufIO’s
VecWriter
currently uses Base/Core internals to manipulateVector
.
Ideally, that should be made API, since I don’t think I do anything ill adviced. Anyway, it means BufIO currently use internals which I’m not that happy about. - BufIO is hard hit by the compiler limitation in issue 53584 (no ABI for pointer-ful union types). That restriction will probably be lifted soon-ish.
- BufIO abstracts “chunks of memory” using
MemoryRef
, but this does not allow zero-allocation reading/writing of strings since you can’t take aMemoryRef
to a string. Hopefully that restriction will be lifted in the future.
- BufIO’s