Cannot read file written by Arrow.jl in Python

I’m using a Julia environment with Arrow v2.5.2 and DataFrames v1.5.1, and Python 3.10.2 with pandas 1.5.0 and pyarrow 11.0.0. From the User manual I think I could do:

using Arrow, DataFrames

m = rand(20,10)

open("examplematrix.feather", "w") do io
    Arrow.write(io, DataFrame(m, :auto))
end

And then read in Python with either:

import pandas
pandas.read_feather("examplematrix.feather")

or

from pyarrow import feather
feather.read_feather("examplematrix.feather")

But in either case I get:

ArrowInvalid                              Traceback (most recent call last)
Cell In [19], line 1
----> 1 pd.read_feather("examplematrix.feather")

File c:\Program Files\Python310\lib\site-packages\pandas\io\feather_format.py:132, in read_feather(path, columns, use_threads, storage_options)
    126 from pyarrow import feather
    128 with get_handle(
    129     path, "rb", storage_options=storage_options, is_text=False
    130 ) as handles:
--> 132     return feather.read_feather(
    133         handles.handle, columns=columns, use_threads=bool(use_threads)
    134     )

File c:\Program Files\Python310\lib\site-packages\pyarrow\feather.py:226, in read_feather(source, columns, use_threads, memory_map, **kwargs)
    199 def read_feather(source, columns=None, use_threads=True,
    200                  memory_map=False, **kwargs):
    201     """
    202     Read a pandas.DataFrame from Feather format. To read as pyarrow.Table use
    203     feather.read_table.
   (...)
    224         The contents of the Feather file as a pandas.DataFrame
    225     """
--> 226     return (read_table(
    227         source, columns=columns, memory_map=memory_map,
...
File c:\Program Files\Python310\lib\site-packages\pyarrow\error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File c:\Program Files\Python310\lib\site-packages\pyarrow\error.pxi:100, in pyarrow.lib.check_status()

ArrowInvalid: Not a Feather V1 or Arrow IPC file

I also tried with feather.read_table as ssuggested in the error message, but I get the same ArrowInvalid error.

Am I doing something wrong? I haven’t found other messages reporting this problem.

EDIT: with Feather.jl it does work, but I want compression support, that that package does not have as far as I know.

Solved. It works if don’t pass Arrow.write an IOStream, but the string with the file path.

I don’t know what you did exactly in the first case, but it doesn’t seem to be a valid Arrow file (could you even read it back in, in Julia?). It’s 2704 bytes, and the other 3266 bytes (and you have an extra period, otherwise that example worked), and only the latter starts with he text ARROW1, I suppose a magic cookie to signal the Arrow file format. Still the “file” command may not be aware of it and shows is both cases:

shell> file examplematrix.feather
examplematrix.feather: data

Yes, both files are read back successfully in Julia, and have the same contents. However, you are right: according to the specification (Arrow Columnar Format — Apache Arrow v14.0.0) the file created with open looks wrong.

I opened an issue commenting on that:

EDIT: And then I have just closed it, after re-reading the docstring and finding that if I pass a IOStream for a file, I should add the keyword argument file=true.