Hi,
I have a Python job that calls Julia for some computation on my datasets. Right now, passing data back and forth between Julia and Python is a bottleneck – my current process is to save the Pandas DataFrame as a feather file to disk, and load it from Julia.
I know PyJulia can pass numpy arrays with 0 copying, and I’m trying to figure out if it’s possible to do the same thing with an Arrow Table. However, a table gets passed as a PyObject
, and the Arrow
library in Julia doesn’t seem to be able to convert it, even on the latest version.
import pandas as pd
df = pd.DataFrame({"id": [1, 2],
"name": ["bob", "sam"]})
table = pyarrow.Table.from_pandas(df)
import julia
jl = julia.Julia(compiled_modules=False)
from julia import Arrow
from julia import Base
Base.length([1, 2, 3])
>> 3
Base.length(table)
>> 2
Base.typeof(table)
>> <PyCall.jlwrap PyObject>
Arrow.Table(table)
RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: MethodError: no method matching Arrow.Table(::PyObject)
Is there a better way to pass datasets between the two?