Issue adding a row record of a DataFrame with `String` name to itself

I have a kind of complicated dataframe df, and I want to append one of its rows to the end, i.e., append!(df, df[1,:]). When I try this, I get an error

julia> append!(df, DataFrame(df[1,:]))
┌ Error: Error adding value to column :ETE_FP.
└ @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1423
ERROR: AssertionError: length(col) == targetrows
Stacktrace:
 [1] append!(df1::DataFrame, df2::DataFrame; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1407
 [2] append!(df1::DataFrame, df2::DataFrame)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1315
 [3] top-level scope
   @ REPL[290]:1

I tried differently:

julia> append!(df, df[1,:])
ERROR: ArgumentError: 'DataFrameRow{DataFrame, DataFrames.Index}' iterates 'String' values, which doesn't satisfy the Tables.jl `AbstractRow` interface
Stacktrace:
 [1] invalidtable(#unused#::DataFrameRow{DataFrame, DataFrames.Index}, #unused#::String)
   @ Tables ~/.julia/packages/Tables/OWzlh/src/tofromdatavalues.jl:42
 [2] iterate
   @ ~/.julia/packages/Tables/OWzlh/src/tofromdatavalues.jl:48 [inlined]
 [3] buildcolumns
   @ ~/.julia/packages/Tables/OWzlh/src/fallbacks.jl:199 [inlined]
 [4] columns
   @ ~/.julia/packages/Tables/OWzlh/src/fallbacks.jl:262 [inlined]
 [5] DataFrame(x::DataFrameRow{DataFrame, DataFrames.Index}; copycols::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:58
 [6] #append!#785
   @ ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:69 [inlined]
 [7] append!(df::DataFrame, table::DataFrameRow{DataFrame, DataFrames.Index})
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:65
 [8] top-level scope
   @ REPL[293]:1

The column “ETE_FP” is just a vector of Ints. I was confused by this error, so I tried to create a smaller example. I couldn’t get the same error, but I got another one…

I created a simple dataframe using Julia 1.6:

julia> mydf = DataFrame("mycol" => [257; 258])
2×1 DataFrame
 Row │ mycol
     │ Int64
─────┼────────
   1 │    257
   2 │    258

and I want to append the first row to mydf, but the following four methods (2 for append! and 2 for push!) are giving errors except for the fourth method. Why does append! not work?

1: julia> append!(mydf, mydf[1,!])

ERROR: MethodError: no method matching getindex(::DataFrame, ::Int64, ::typeof(!))
Closest candidates are:
  getindex(::AbstractDataFrame, ::Integer, ::Colon) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:210
  getindex(::AbstractDataFrame, ::Integer, ::Union{Colon, Regex, AbstractVector{T} where T, All, Between, Cols, InvertedIndex}) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:208
  getindex(::DataFrame, ::Integer, ::Union{AbstractString, Symbol}) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:488
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[283]:1

2: julia> append!(mydf, mydf[1,:])

ERROR: ArgumentError: 'DataFrameRow{DataFrame, DataFrames.Index}' iterates 'Int64' values, which doesn't satisfy the Tables.jl `AbstractRow` interface
Stacktrace:
 [1] invalidtable(#unused#::DataFrameRow{DataFrame, DataFrames.Index}, #unused#::Int64)
   @ Tables ~/.julia/packages/Tables/OWzlh/src/tofromdatavalues.jl:42
 [2] iterate
   @ ~/.julia/packages/Tables/OWzlh/src/tofromdatavalues.jl:48 [inlined]
 [3] buildcolumns
   @ ~/.julia/packages/Tables/OWzlh/src/fallbacks.jl:199 [inlined]
 [4] columns
   @ ~/.julia/packages/Tables/OWzlh/src/fallbacks.jl:262 [inlined]
 [5] DataFrame(x::DataFrameRow{DataFrame, DataFrames.Index}; copycols::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:58
 [6] #append!#785
   @ ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:69 [inlined]
 [7] append!(df::DataFrame, table::DataFrameRow{DataFrame, DataFrames.Index})
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/other/tables.jl:65
 [8] top-level scope
   @ REPL[277]:1

3: julia> push!(mydf,mydf[1,!])

ERROR: MethodError: no method matching getindex(::DataFrame, ::Int64, ::typeof(!))
Closest candidates are:
  getindex(::AbstractDataFrame, ::Integer, ::Colon) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:210
  getindex(::AbstractDataFrame, ::Integer, ::Union{Colon, Regex, AbstractVector{T} where T, All, Between, Cols, InvertedIndex}) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:208
  getindex(::DataFrame, ::Integer, ::Union{AbstractString, Symbol}) at /Users/jakeroth/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:488
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[280]:1

4: julia> push!(mydf, mydf[1,:])

3×1 DataFrame
 Row │ mycol
     │ Int64
─────┼───────
   1 │   257
   2 │   258
   3 │   257

append! does not work because in Julia append! appends collections.
To add a single element use push!. An example:

julia> df = DataFrame(a=1:3, b='a':'c')
3×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c

julia> df1 = df[1:1, :] # this is a collection of rows - a data frame in this case
1×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a

julia> row1 = df[1, :] # this is an element, a single row
DataFrameRow
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a

julia> append!(df, df1) # use append! to add collections
4×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c
   4 │     1  a

julia> push!(df, row1) # use push! to add elements
5×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c
   4 │     1  a
   5 │     1  a

This is the same as with vectors in Base Julia:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> x1 = x[1:1]
1-element Vector{Int64}:
 1

julia> e1 = x[1]
1

julia> append!(x, x1)
4-element Vector{Int64}:
 1
 2
 3
 1

julia> push!(x, e1)
5-element Vector{Int64}:
 1
 2
 3
 1
 1
2 Likes

Thanks! The difference between append! and push! is now clear to me.

However, I have the issue

julia> append!(df,df)
┌ Error: Error adding value to column :ETE_FP.
└ @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1423
ERROR: AssertionError: length(col) == targetrows
Stacktrace:
 [1] append!(df1::DataFrame, df2::DataFrame; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1407
 [2] append!(df1::DataFrame, df2::DataFrame)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframe/dataframe.jl:1315
 [3] top-level scope
   @ REPL[52]:1

and I haven’t been able to identify the cause of this. When I look at progressively larger dataframes (iterating through the columns), this works

for i in length(names(df))
  println(i)
  mydf = deepcopy(df)
  mydf = deepcopy(mydf[:, 1:i])
  for j in 1:5
    push!(mydf, mydf[1, :])
  end
end

but then I try to do it again and get errors:

julia> mydf = deepcopy(df)
julia> push!(mydf, mydf[1,:])
ERROR: AssertionError: Error adding value to column :ETE_FP
Stacktrace:
 [1] push!(df::DataFrame, dfr::DataFrameRow{DataFrame, DataFrames.Index}; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:524
 [2] push!(df::DataFrame, dfr::DataFrameRow{DataFrame, DataFrames.Index})
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:503
 [3] top-level scope
   @ REPL[56]:1

and somehow a different error

julia> mydf = deepcopy(df)
julia> x = mydf[1,:]
DataFrameRow
 Row │ ACID    ASQPReportedCarrierDelay  ASQPReportedLateArrivalDelay  ASQPReportedNASDelay  ASQPReportedSecu ⋯
     │ Any     Any                       Any                           Any                   Any              ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ UAL548  0                         0                             0                     0                ⋯
                                                                                            106 columns omitted

julia> push!(mydf, x)
┌ Error: Error adding value to column :ArrTimeGDPPerturbed_SystemTime.
└ @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:630
ERROR: AssertionError: length(col) == targetrows
Stacktrace:
 [1] push!(df::DataFrame, dfr::DataFrameRow{DataFrame, DataFrames.Index}; cols::Symbol, promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:623
 [2] push!(df::DataFrame, dfr::DataFrameRow{DataFrame, DataFrames.Index})
   @ DataFrames ~/.julia/packages/DataFrames/BM4OQ/src/dataframerow/dataframerow.jl:503
 [3] top-level scope
   @ REPL[55]:1

Some of my columns are vectors, i.e.,

julia> [typeof(z) for z in values(x)]
 ⋮
Float64
String
Vector{Float64}
 ⋮

This should not happen normally:

julia> df = DataFrame(a=1:3, b='a':'c')
3×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c

julia> append!(df, df)
6×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c
   4 │     1  a
   5 │     2  b
   6 │     3  c

However, it can happen if you have columns that are aliases:

julia> df = DataFrame(a=1:3, b='a':'c')
3×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  b
   3 │     3  c

julia> df.c = df.a
3-element Vector{Int64}:
 1
 2
 3

julia> append!(df, df)
┌ Error: Error adding value to column :a.
└ @ DataFrames
~\.julia\packages\DataFrames\MA4YO\src\dataframe\dataframe.jl:1423
ERROR: AssertionError: length(col) == targetrows

While DataFrame allows for storing aliased columns you should avoid doing this, as it can lead to errors (which are caught - as in this case - but are in general hard to diagnose).

In my above example instead of df.c = df.a it would be better to write df.c = copy(df.a) and all would work.

To be clear what alias means: it is a situation that two columns are identical (i.e. they have the same memory location).

2 Likes

Aha, it seems that somewhere I have an alias! Thanks a lot, I will try to identify the alias and copy() it!