I wanted to bring the following to the attention of anyone who might be developing any IO-related package.
Today with “microservices” and docker containers being common, it is often the case that IO takes place over a network connection rather than in the file system. As a specific but important example, files stored in AWS S3 are (exclusively, as far as I know) fetched over HTTP. Retrieving such a file usually means that you will fetch it into memory without necessarily needing to ever use the file system. In some cases, particularly if you program is running in a docker container, you may not have any disk space allocated and want to avoid using the file system altogether. Having methods available for supporting buffers and IO streams also has other advantages like making it easier to support memory mapping (currently Mmap.mmap
returns a Vector{UInt8}
) and different types of streams.
For this reason, it’s important in the development of IO packages to support in-memory buffers on equal footing with files, both in code and in documentaiton. I recommend that any function that accepts a file name as an argument have methods that accept the following:
- An
AbstractVector{UInt8}
(or at leastVector{UInt8}
). - An
IO
object. - A filename.
Supporting these cases will give your IO package much wider applicability, I hope you’ll at least keep in mind the possibility of supporting these, thanks!
(As you might have guessed, I may have a few PR’s incoming to some IO packages in the coming weeks.)