Passing an Arrow Table from Python to Julia

Satvik · February 26, 2021, 5:58pm

Hi,

I have a Python job that calls Julia for some computation on my datasets. Right now, passing data back and forth between Julia and Python is a bottleneck – my current process is to save the Pandas DataFrame as a feather file to disk, and load it from Julia.

I know PyJulia can pass numpy arrays with 0 copying, and I’m trying to figure out if it’s possible to do the same thing with an Arrow Table. However, a table gets passed as a PyObject, and the Arrow library in Julia doesn’t seem to be able to convert it, even on the latest version.

import pandas as pd
df = pd.DataFrame({"id": [1, 2],
                  "name": ["bob", "sam"]})
table = pyarrow.Table.from_pandas(df)


import julia
jl = julia.Julia(compiled_modules=False)
from julia import Arrow
from julia import Base
Base.length([1, 2, 3])
>> 3
Base.length(table)
>> 2
Base.typeof(table)
>> <PyCall.jlwrap PyObject>
Arrow.Table(table)
RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: MethodError: no method matching Arrow.Table(::PyObject)

Is there a better way to pass datasets between the two?

quinnj · February 26, 2021, 6:08pm

Arrow.Table needs to either be provided an arrow-formatted file as a string (like Arrow.Table(file)), or a byte vector Vector{UInt8}. You can see an example at least from the Julia side of “round tripping” arrow data w/ pyarrow here: arrow-julia/pyarrow_roundtrip.jl at main · apache/arrow-julia · GitHub. So in your case, I’d see if there’s a way to get access from your table object to the raw arrow-formatted bytes.

Satvik · February 26, 2021, 9:30pm

Thanks! I was able to get a PyJulia version working with a few small modifications:

df = pd.DataFrame({"id": [1, 2],
                  "name": ["bob", "sam"]})
batch = pyarrow.record_batch(df)
sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream(sink, batch.schema)
writer.write_batch(batch)
writer.close()
buf = sink.getvalue()
jbytes = buf.to_pybytes()
tt = Arrow.Table(bytearray(jbytes))

Topic		Replies	Views
Is it possible to do a simple example of using Arrow.jl to pass data from/to Python? General Usage python , arrow	3	707	April 28, 2021
Pyarrow conversion with PythonCall General Usage dataframes , pythoncall , arrow	11	562	May 27, 2023
Reading and writing Apache arrow files General Usage question , package , arrow	4	5756	May 28, 2022
An example of Apache Arrow file? Data arrow	7	2884	April 22, 2021
[ANN] Arrow.jl 0.3 Release Data arrow	21	3174	March 16, 2021

Passing an Arrow Table from Python to Julia

Related topics