Pyjulia - accessing data in a Julia Dataframe that was called from Python

00krishna · December 10, 2019, 5:24am

I had a bunch of help from @stevengj to solve this problem of calling Julia from python. So that was great. I can now write a function in Julia that generates a Dataframe and call that back into python. But I was not sure how to actually access the elements of the Dataframe, since the returned object is a PyCall.jlwrap. Let me give an example.

If I have a julia file like so:

#jinclude.jl
using DataFrames

function myDf(row, columns)
  DataFrame(rand(row, columns))
end

Then I have a Python script like this.

#Pythonscript.py

import julia
jl = julia.Julia(compiled_modules=False)
jl.include('jinclude.jl')  # my file with functions
from julia import Main
a = Main.myDf(3,4) # function from `jinclude.jl` file.

So a will return a DataFrame, but if I go to access its elements, such as a[1,1] I get an error TypeError: 'PyCall.jlwrap' object is not subscriptable. This is understandable as I did not imagine the data structures would pass smoothly between languages. But does anyone know a good way to destructure or export the PyCall.jlwrap so that I can pull the elements of the dataframe array into Python?

tkf · December 10, 2019, 5:49am

I usually let Pandas.jl handle the conversion and receive the dataframe as pandas.DataFrame on Python side.

#jinclude.jl
using DataFrames
import Pandas

function myDf(row, columns)
  Pandas.DataFrame(DataFrame(rand(row, columns)))
end

00krishna · December 10, 2019, 6:29am

Oh yeah, I just tried this and it worked great. Wow, this is so much easier than trying to do stuff with passing pointers to C arrays and Numpy, etc. Thanks so much for your tip here.

tk3369 · July 14, 2020, 1:47am

I’m dealing with the same issue but I have missing data… How do I get around this?

JuliaError: Exception 'ArgumentError: Can't create a Pandas.DataFrame from a source that has missing data.' occurred while calling julia code:
df = Pandas.DataFrame(DataFrames.DataFrame(x=1:2, y=[3,missing], dt=[now(),missing]))

lungben · July 14, 2020, 4:44am

You could try to coalesce missing data by a dummy value or drop missing rows altogether.

00krishna · July 16, 2020, 2:20pm

@tk3369 what I did was convert all missing values to NAs or NaNs in Julia, before pushing to python. The error message is really confusing because it does not explain the source of the error. The error is caused because the julia Pandas package or something does not have the DataFrame function implemented for the missing datatype, so it defaults to an implementation up the dispatch chain and fails.

I opened an issue with the Pandas.jl github on this issue.

https://github.com/JuliaPy/Pandas.jl/issues/71

In my case I just had a column of missing values, and I converted that column to NAs or NaNs, and then ran Pandas.Dataframe() and it worked.

Topic		Replies	Views
Converting Pandas Dataframe returned from PyCall to Julia DataFrame General Usage pycall , dataframes	18	5609	May 27, 2022
Jlwrap in PyJulia New to Julia question	3	1815	September 11, 2019
Problem Reading Python Pandas object into Julia General Usage question	9	1754	November 28, 2018
Sharing a python dataframe in Julia General Usage	11	1368	January 27, 2020
Pyarrow conversion with PythonCall General Usage dataframes , pythoncall , arrow	11	564	May 27, 2023

Pyjulia - accessing data in a Julia Dataframe that was called from Python

Related topics