(DataFrames.jl) Initially allowing a DataFrame to receive missing

According to allowmissing! in DataFrames.jl,
it is possible to make a DataFrame “able to receive missing”:

julia> using DataFrames
d
julia> df = DataFrame()
0×0 DataFrame

julia> push!(df, (;a=1, b=1.0))
1×2 DataFrame
 Row │ a      b
     │ Int64  Float64
─────┼────────────────
   1 │     1      1.0

julia> push!(df, (;a=1, b=missing))
┌ Error: Error adding value to column :b.
└ @ DataFrames /Users/jinrae/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1483
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:262
  ...
Stacktrace:
 [1] push!(a::Vector{Float64}, item::Missing)
   @ Base ./array.jl:994
 [2] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Missing}}; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1465
 [3] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Missing}})
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1353
 [4] top-level scope
   @ REPL[4]:1

julia> allowmissing!(df)
1×2 DataFrame
 Row │ a       b
     │ Int64?  Float64?
─────┼──────────────────
   1 │      1       1.0

julia> push!(df, (;a=1, b=missing))
2×2 DataFrame
 Row │ a       b
     │ Int64?  Float64?
─────┼───────────────────
   1 │      1        1.0
   2 │      1  missing

However, I don’t get it how to initially allow a DataFrame receivable to missing data.
For example, the following things didn’t work.

julia> using DataFrames

julia> df = DataFrame()
al0×0 DataFrame

julia> allowmissing!(df)
0×0 DataFrame

julia> push!(df, (;a=1, b=1.0))
pu1×2 DataFrame
 Row │ a      b
     │ Int64  Float64
─────┼────────────────
   1 │     1      1.0

julia> push!(df, (;a=1, b=missing))
┌ Error: Error adding value to column :b.
└ @ DataFrames /Users/jinrae/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1483
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/number.jl:6
  convert(::Type{T}, ::Number) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/number.jl:7
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:262
  ...
Stacktrace:
 [1] push!(a::Vector{Float64}, item::Missing)
   @ Base ./array.jl:994
 [2] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Missing}}; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1465
 [3] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Missing}})
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1353
 [4] top-level scope
   @ REPL[5]:1

Although there is a remedy that pre-assigning columns’ type as

julia> df = DataFrame(a = Int[], b = Union{Missing,Float64}[])
0×2 DataFrame

julia> push!(df, (;a=1, b=1.0))
1×2 DataFrame
 Row │ a      b
     │ Int64  Float64?
─────┼─────────────────
   1 │     1       1.0

julia> push!(df, (;a=1, b=missing))
2×2 DataFrame
 Row │ a      b
     │ Int64  Float64?
─────┼──────────────────
   1 │     1        1.0
   2 │     1  missing

But what if I don’t want to specify a type of each column? Or, I sometimes can’t even specify columns themselves.

I don’t think what you’re after is possible - an empty DataFrame is actually empty, so can’t hold type information:

julia> df = DataFrame()
0×0 DataFrame

julia> getfield(df, :columns)
AbstractVector[]

so the columns only get created when you push! something. You can specify columns without types which will take Any value:

julia> df = DataFrame(a = [], b = [])
0×2 DataFrame

julia> getfield(df, :columns)
2-element Vector{AbstractVector}:
 Any[]
 Any[]

julia> push!(df, (; a = 5.0, b = 4))
1×2 DataFrame
 Row │ a    b   
     │ Any  Any 
─────┼──────────
   1 │ 5.0  4

julia> push!(df, (; a = 5.0, b = missing))
2×2 DataFrame
 Row │ a    b       
     │ Any  Any     
─────┼──────────────
   1 │ 5.0  4
   2 │ 5.0  missing 

but this is of course problematic from a performance perspective, and you still need to know the number of columns.

Why can’t you just do allowmissing!(df) after the first push!?

I thought about your suggestion, but what if there is a missing in the first push!?
Then, the corresponding column’s type would be Missing, which does not allow concatenating other types.

For example,

julia> using DataFrames
df =
julia> df = DataFrame()
0×0 DataFrame

julia> push!(df, (;a=1, b=missing))
1×2 DataFrame
 Row │ a      b
     │ Int64  Missing
─────┼────────────────
   1 │     1  missing

julia> allowmissing!(df)
1×2 DataFrame
 Row │ a       b
     │ Int64?  Missing
─────┼─────────────────
   1 │      1  missing

julia> push!(df, (;a=1, b=1.0))
┌ Error: Error adding value to column :b.
└ @ DataFrames /Users/jinrae/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1483
ERROR: MethodError: convert(::Type{Union{}}, ::Float64) is ambiguous. Candidates:
  convert(::Type{T}, x::Number) where T<:AbstractChar in Base at char.jl:184
  convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
  convert(::Type{Union{}}, x) in Base at essentials.jl:216
  convert(::Type{T}, arg) where T<:VecElement in Base at baseext.jl:19
Possible fix, define
  convert(::Type{Union{}}, ::Number)
Stacktrace:
 [1] convert(#unused#::Type{Missing}, x::Float64)
   @ Base ./missing.jl:69
 [2] push!(a::Vector{Missing}, item::Float64)
   @ Base ./array.jl:994
 [3] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Float64}}; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1465
 [4] push!(df::DataFrame, row::NamedTuple{(:a, :b), Tuple{Int64, Float64}})
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1353
 [5] top-level scope
   @ REPL[5]:1

I just find it hard to imagine a situation where I push! things repeatedly to a DataFrame and have no idea about the number of columns or there types?

Maybe a DataFrame isn’t the right thing for your use case, why not just a vector of vectors?

1 Like

Hmm…

You might be right.
I need to consider my problem again.
Thank you for your advice :slight_smile:

1 Like

An easy way to build a table row-by-row without specifying types upfront:

using BangBang
tbl = Union{}[]
tbl = push!!(tbl, (a=1, b=1.0))
tbl = push!!(tbl, (a=1, b=missing))
...
1 Like

DataFrames.jl has this too

julia> df = DataFrame();

julia> push!(df, (a =1, b = 2))
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> push!(df, (a =3, b = missing); promote = true)
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼────────────────
   1 │     1        2
   2 │     3  missing
3 Likes

Ah, that’s the solution then, as this also works in reverse (i.e. promote a Missing only column to the approriate union type)

1 Like