@sairus7 comparing HDF5 with a columnar file format is truly apples and oranges. HDF5 does not provide a columnar data model for datasets nor metadata describing schemas for such. I’ve used HDF5 for more than 10 years now, and it serves as “a place to put raw bytes or a collection of ndarrays/tensors”. The memory model of data that is stored in Arrow format (in-memory or on disk) or Parquet format on disk is significantly different.
To make the distinction between HDF5 and Parquet clear: you can feed Parquet directly into a SQL-based query engine without any special logic but no such thing is possible with HDF5 without layering some kind of opinionated “semantic layer” on top of HDF5 to provide a columnar data interpretation of some collection of arrays stored inside.
@ExpandingMan I wrote about Feather’s history and trajectory in Wes McKinney - Feather format update: Whence and Whither?. The discussed plan indeed is to deprecate the “feather.fbs” file and have Feather be simply an alias for the Arrow IPC file format.
I waited patiently for the R community to build bindings for the Arrow C++ library and get them on CRAN, but that did not happen until August 2019, nearly 3.5 years after the initial release of Feather. Until that happened, it wasn’t possible for me to modify the format. We have no plans to do any further development in GitHub - wesm/feather: Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
I was a little puzzled why you guys kept such a low profile and didn’t advertise that format more
This seems like a pretty subjective judgment. Perhaps we did not “market” the binary protocol in the way that you’re suggesting. We’re not trying to displace other storage formats, for example. We’ve created a lot of technical content, blog posts, slide decks, etc. illustrating the performance and interoperability benefits of using the Arrow format. It’s is being used in numerous downstream open source applications and many more proprietary applications. On the basis of implementation maturity and downstream adoption it would seem that we’ve reached many of our intended audiences.
At the end of the day, we are an open source community and we do not have any commercial entities profiting directly from adoption of Apache Arrow. I would rather have Julia developers part of our community and work together on these problems, including the technical evangelism.