Say I am using PyCall.jl to start a python session. Can I receive an arrow vector from python without having to write it to disk?
What’s the mechanism to do so? A simple MWE would be highly appreciated!
Also, if I call Julia from Python, is it possible to receive a Julia vector via arrow in python?
Yes! To pass arrow data back and forth, you need to first convert it to a byte array in the sender, and then interpret that byte array in the receiver. Here are some functions I wrote to do so, with dataframes:
import pyarrow as pa
def convert_to_arrow_bytes(df: pd.DataFrame) -> bytearray:
Efficiently convert a dataframe to arrow bytes in memory
For transfer to other processes
Modified from https://github.com/JuliaData/Arrow.jl/blob/main/test/pyarrow_roundtrip.jl
batch = pa.record_batch(df)
sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream(sink, batch.schema)
buf = sink.getvalue()
jbytes = buf.to_pybytes()
def receive_arrow_bytes(byte_array: bytearray) -> pd.DataFrame:
reader = pa.ipc.open_stream(byte_array)
pyarrow_table = reader.read_all()
return bytearray |> Arrow.Table |> DataFrame |> disallowmissing!
io = IOBuffer()
byte_array = take!(io)
This is awesome! I should’ve seen it in the test folder too.
This will be so cool!
how does python and julia know which byte array to read from?
Is there a MWE showing how to do that from Julia and python?
say I have file.jl, what in there allows me to call a python function go generate a dataframe and then receive it in Julia?
The other way around, in a file2.py, say I call a Julia function to create a dataframe, how do I receive it in python?