Is it possible to do a simple example of using Arrow.jl to pass data from/to Python?

Say I am using PyCall.jl to start a python session. Can I receive an arrow vector from python without having to write it to disk?

What’s the mechanism to do so? A simple MWE would be highly appreciated!

Also, if I call Julia from Python, is it possible to receive a Julia vector via arrow in python?

Yes! To pass arrow data back and forth, you need to first convert it to a byte array in the sender, and then interpret that byte array in the receiver. Here are some functions I wrote to do so, with dataframes:

Python:

import pyarrow as pa
def convert_to_arrow_bytes(df: pd.DataFrame) -> bytearray:
    """
    Efficiently convert a dataframe to arrow bytes in memory
    For transfer to other processes
    Modified from https://github.com/JuliaData/Arrow.jl/blob/main/test/pyarrow_roundtrip.jl
    """
    batch = pa.record_batch(df)
    sink = pa.BufferOutputStream()
    writer = pa.ipc.new_stream(sink, batch.schema)
    writer.write_batch(batch)
    writer.close()
    buf = sink.getvalue()
    jbytes = buf.to_pybytes()
    return bytearray(jbytes)

def receive_arrow_bytes(byte_array: bytearray) -> pd.DataFrame:
    reader = pa.ipc.open_stream(byte_array)
    pyarrow_table = reader.read_all()
    return pyarrow_table.to_pandas()

Julia:

function load_df(bytearray::Vector{UInt8})::DataFrame
    return bytearray |> Arrow.Table |> DataFrame |> disallowmissing!
end

function df_to_arrow_bytes(df)
    io = IOBuffer()
    Arrow.write(io, df)
    seekstart(io)
    byte_array = take!(io)
    return byte_array
end
3 Likes

This is awesome! I should’ve seen it in the test folder too.

This will be so cool!

how does python and julia know which byte array to read from?

Is there a MWE showing how to do that from Julia and python?

say I have file.jl, what in there allows me to call a python function go generate a dataframe and then receive it in Julia?

The other way around, in a file2.py, say I call a Julia function to create a dataframe, how do I receive it in python?