I have a pretty large DataFrames.jl dataframe that I am using for some data simulations. However, I need to pass the data back to python for some visualizations and @tkf suggested that I convert the DataFrames DataFrame to a Pandas.jl dataframe first, and then pass it to python using pyjulia. Sounds easy enough.
But for some reason, I am getting an error when converting from one type of Dataframe to the other type within Julia.
So I have the basic code like
using DataFrames
import Pandas
function generate_df()
df = run_simulation() ## df is a DataFrames.DataFrame
Pandas.DataFrame(df)
end
But I am getting an error like this below, and I am not sure what the actual problem is. Seems like there is some conversion occurring between a Union{} type and Float64, but I am not sure which column
in the original dataframe that this is referring to, or is this related to missing values, or something else?
Anyone have any ideas on how to resolve?
ERROR: LoadError: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous. Candidates:
convert(::Type{Union{}}, x) in Base at essentials.jl:169
convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
convert(::Type{T}, arg) where T<:VecElement in Base at baseext.jl:8
convert(::Type{T}, x::Number) where T<:AbstractChar in Base at char.jl:179
Possible fix, define
convert(::Type{Union{}}, ::Number)
Stacktrace:
[1] setindex!(::Array{Union{},1}, ::Float64, ::Int64) at ./array.jl:825
[2] _construct_pandas_from_iterabletable(::DataFrame) at /home/krishnab/.julia/packages/Pandas/rAPmB/src/tabletraits.jl:37
[3] DataFrame at /home/krishnab/.julia/packages/Pandas/rAPmB/src/Pandas.jl:457 [inlined]
[4] run_julia_model(::Dict{String,Any}, ::Int64, ::Int64) at /media/krishnab/lakshmi/sandbox/julia/pyjulia/test_julia.jl:6
[5] top-level scope at /media/krishnab/lakshmi/sandbox/julia/pyjulia/test_julia.jl:24
[6] include(::Module, ::String) at ./Base.jl:377
[7] exec_options(::Base.JLOptions) at ./client.jl:288
[8] _start() at ./client.jl:484
@tkf Oh thanks so much, so interesting that it is a bug in Pandas.jl. Hmm, I can open an issue with them. Thanks for always being so helpful and available.
But I am not clear what the underlying problem is. So how am I getting a Union{} type? Like is that for a specific column. When I checked the column datatypes for the original dataframe using unique(eltypes(df)) the list is:
So I don’t have any Union{} only types, though that is probably the supertype of the Union. So is the multiple dispatch defaulting to the Union{} type because it does not have an implementation for the specific Union{Missing, Float64} type or something. Or could it be that I have an unimplemented column that is all Missing and that is throwing it off?
I will see if I can dev the Pandas package and step through that section of code to trap the error. If I delete columns one by one, I can find the offending column.
@tkf thanks. Okay this makes more sense. I will open an issue with Pandas.jl. The type system still throws me off sometimes, but I am slowly getting a hang of the errors. That is a great idea, to see how DataValue is registering these values and columns. I was going through and converting each column to a Pandas.DataFrame on at a time, to find the problem column, but your approach is probably faster.