Converting Pandas Dataframe returned from PyCall to Julia DataFrame

Pandas DataFrames and Julia DataFrames.jl cannot directly be converted to my knowledge.
However, the underlying data structures, Numpy arrays and Julia Arrays, can be passed very efficiently with PyCall.

using PyCall
using DataFrames

pd = pyimport("pandas")
df= pd.read_csv("test_data.csv")

function pd_to_df(df_pd)
    df= DataFrame()
    for col in df_pd.columns
        df[!, col] = getproperty(df_pd, col).values
    end
    df
end

df_julia = pd_to_df(df)

Performance is very good - 0.1s for 450k rows and 20 columns on my quite weak machine.

4 Likes