Pyjulia - accessing data in a Julia Dataframe that was called from Python

I had a bunch of help from @stevengj to solve this problem of calling Julia from python. So that was great. I can now write a function in Julia that generates a Dataframe and call that back into python. But I was not sure how to actually access the elements of the Dataframe, since the returned object is a PyCall.jlwrap. Let me give an example.

If I have a julia file like so:

#jinclude.jl
using DataFrames

function myDf(row, columns)
  DataFrame(rand(row, columns))
end

Then I have a Python script like this.

#Pythonscript.py

import julia
jl = julia.Julia(compiled_modules=False)
jl.include('jinclude.jl')  # my file with functions
from julia import Main
a = Main.myDf(3,4) # function from `jinclude.jl` file.

So a will return a DataFrame, but if I go to access its elements, such as a[1,1] I get an error TypeError: 'PyCall.jlwrap' object is not subscriptable. This is understandable as I did not imagine the data structures would pass smoothly between languages. But does anyone know a good way to destructure or export the PyCall.jlwrap so that I can pull the elements of the dataframe array into Python?

I usually let Pandas.jl handle the conversion and receive the dataframe as pandas.DataFrame on Python side.

#jinclude.jl
using DataFrames
import Pandas

function myDf(row, columns)
  Pandas.DataFrame(DataFrame(rand(row, columns)))
end

Oh yeah, I just tried this and it worked great. Wow, this is so much easier than trying to do stuff with passing pointers to C arrays and Numpy, etc. Thanks so much for your tip here.

I’m dealing with the same issue but I have missing data… How do I get around this?

JuliaError: Exception 'ArgumentError: Can't create a Pandas.DataFrame from a source that has missing data.' occurred while calling julia code:
df = Pandas.DataFrame(DataFrames.DataFrame(x=1:2, y=[3,missing], dt=[now(),missing]))

You could try to coalesce missing data by a dummy value or drop missing rows altogether.

@tk3369 what I did was convert all missing values to NAs or NaNs in Julia, before pushing to python. The error message is really confusing because it does not explain the source of the error. The error is caused because the julia Pandas package or something does not have the DataFrame function implemented for the missing datatype, so it defaults to an implementation up the dispatch chain and fails.

I opened an issue with the Pandas.jl github on this issue.

In my case I just had a column of missing values, and I converted that column to NAs or NaNs, and then ran Pandas.Dataframe() and it worked.