Error converting from a DataFrames.DataFrame to a Pandas.Dataframe

I have a pretty large DataFrames.jl dataframe that I am using for some data simulations. However, I need to pass the data back to python for some visualizations and @tkf suggested that I convert the DataFrames DataFrame to a Pandas.jl dataframe first, and then pass it to python using pyjulia. Sounds easy enough.

But for some reason, I am getting an error when converting from one type of Dataframe to the other type within Julia.

So I have the basic code like

using DataFrames
import Pandas

function generate_df()
    df = run_simulation()  ## df is a DataFrames.DataFrame
    Pandas.DataFrame(df)
end

But I am getting an error like this below, and I am not sure what the actual problem is. Seems like there is some conversion occurring between a Union{} type and Float64, but I am not sure which column
in the original dataframe that this is referring to, or is this related to missing values, or something else?

Anyone have any ideas on how to resolve?

ERROR: LoadError: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous. Candidates:
  convert(::Type{Union{}}, x) in Base at essentials.jl:169
  convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
  convert(::Type{T}, arg) where T<:VecElement in Base at baseext.jl:8
  convert(::Type{T}, x::Number) where T<:AbstractChar in Base at char.jl:179
Possible fix, define
  convert(::Type{Union{}}, ::Number)
Stacktrace:
 [1] setindex!(::Array{Union{},1}, ::Float64, ::Int64) at ./array.jl:825
 [2] _construct_pandas_from_iterabletable(::DataFrame) at /home/krishnab/.julia/packages/Pandas/rAPmB/src/tabletraits.jl:37
 [3] DataFrame at /home/krishnab/.julia/packages/Pandas/rAPmB/src/Pandas.jl:457 [inlined]
 [4] run_julia_model(::Dict{String,Any}, ::Int64, ::Int64) at /media/krishnab/lakshmi/sandbox/julia/pyjulia/test_julia.jl:6
 [5] top-level scope at /media/krishnab/lakshmi/sandbox/julia/pyjulia/test_julia.jl:24
 [6] include(::Module, ::String) at ./Base.jl:377
 [7] exec_options(::Base.JLOptions) at ./client.jl:288
 [8] _start() at ./client.jl:484

This seems like just a bug in Pandas.jl. Maybe open an issue?

In particular, this does not treat T == Union{} case well because Union{} <: S holds for any type S (including AbstractFloat).

@tkf Oh thanks so much, so interesting that it is a bug in Pandas.jl. Hmm, I can open an issue with them. Thanks for always being so helpful and available.

But I am not clear what the underlying problem is. So how am I getting a Union{} type? Like is that for a specific column. When I checked the column datatypes for the original dataframe using unique(eltypes(df)) the list is:

 Int64
 Union{Missing, Int64}
 Float64
 Union{Missing, Float64}
 Missing
 String

So I don’t have any Union{} only types, though that is probably the supertype of the Union. So is the multiple dispatch defaulting to the Union{} type because it does not have an implementation for the specific Union{Missing, Float64} type or something. Or could it be that I have an unimplemented column that is all Missing and that is throwing it off?

I will see if I can dev the Pandas package and step through that section of code to trap the error. If I delete columns one by one, I can find the offending column.

Yeah, it looks like that’s how DataValues.jl works

julia> DataValue(missing)
DataValue{Union{}}()

julia> eltype(ans)
Union{}

Anyway, I think it’s better to open an issue in Pandas.jl repo.

@tkf thanks. Okay this makes more sense. I will open an issue with Pandas.jl. The type system still throws me off sometimes, but I am slowly getting a hang of the errors. That is a great idea, to see how DataValue is registering these values and columns. I was going through and converting each column to a Pandas.DataFrame on at a time, to find the problem column, but your approach is probably faster.

@tkf I opened a new issue against Pandas.jl for this issue.

https://github.com/JuliaPy/Pandas.jl/issues/71

Just wanted to let you know. No need to respond.