Replacing missing values in dataframe-convert-type-union-float64-is-ambiguous

Hi, I am a new Julia user and I am having difficulty imputing missing values in my dataframe. The replacement value is dependent on the position of the column and other values in that row. Coming from a Java background, I adopted loops. i.e.,

for row_counter in 1:size(df,1)
        for column_counter in 1:size(df,2)
            if ismissing(df[row_counter,column_counter])
                next_val = get_next_non_missing_val(df[row_counter,:],column_counter) # get the next non missing value for that row
                if column_counter==0
                    if next_val==-1 # custom logic
                        next_val = 100
                elseif column_counter==size(df,2)
                    if next_val==-1 
                    df[row_counter,column_counter] = (df[row_counter,column_counter-1]+next_val)/2
                    println(df[row_counter,column_counter] )

However, I keep getting the following error:

ERROR: LoadError: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous. Candidates:
  convert(::Type{Union{}}, x) in Base at essentials.jl:169
  convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
  convert(::Type{T}, arg) where T<:VecElement in Base at baseext.jl:8
  convert(::Type{T}, x::Number) where T<:AbstractChar in Base at char.jl:179
Possible fix, define
  convert(::Type{Union{}}, ::Number)

I believe this is due the fact that as my dataframe has missing values, its considered is considered as Union, but I don’t know how to resolve this issue. Can someone please help me and also advise if the methodology I adopted to impute the missing values is the efficient?

1 Like

You cut off the useful part of the stack trace, we don’t know where this error happened in your code either so it’s a bit hard to come up with ideas. On first glance it doesn’t seem to me that it’s in the code you displayed, maybe in a function you’re calling?

Thanks for the reply. Sorry for the incomplete stacktrace.
The error is thrown in the line:
df[row_counter,column_counter] = (df[row_counter,column_counter-1]+next_val)/2

This is the rest of the stacktrace:

 [1] convert(::Type{Missing}, ::Float64) at ./missing.jl:69
 [2] setindex!(::Array{Missing,1}, ::Float64, ::Int64) at ./array.jl:847
 [3] insert_single_entry!(::DataFrame, ::Float64, ::Int64, ::Int64) at /home/bumblebee/.julia/packages/DataFrames/yqToF/src/dataframe/dataframe.jl:520
 [4] setindex!(::DataFrame, ::Float64, ::Int64, ::Int64) at /home/bumblebee/.julia/packages/DataFrames/yqToF/src/dataframe/dataframe.jl:560
 [5] handle_missing_values(::DataFrame) at parser.jl:103

Aha so it appears that one of your columns contains only missing values, therefore it’s typed Array{Missing,1} and you can’t put a Float into it because floats can’t be converted to Missings.

You can convert the column to eltype Union{Float64,Missing} first and then it will work.

I have added the following before the for loop:

for name in names(df)
        if eltype(df[!,Symbol(name)])==Missing 
        df[!,Symbol(name)]=convert(Vector{Union{Float64,Missing}}, df[!,Symbol(name)])

It works, though not sure if this is the best approach. Any suggestions?

  1. You don’t have to add Symbol. DataFrames takes strings with indexing now
  2. You can do Vector{Union{Float64, Missing}}(x) instead of convert
  3. If you are reading data from a CSV you can specify the eltypes of certain columns on import, which is probably the most elegant solution.
1 Like

Thanks all for the help!