Help with appending row in read in DataFrame (weird behavior)

mb96 · September 24, 2022, 2:38am

If I directly define a dataframe as

GMM_data = DataFrame(ror = [], rev_C = [], rev_NG = [], uu_1_C = [], uu_2_C = [], uu_3_C = [], uu_4_C = [], uu_1_NG = [], uu_2_NG = [], uu_3_NG = [], uu_4_NG = [],
        ramp_C = [], ramp_NG = [], g_h_C = [], g_h_NG = [], g_h_NGC = [], prof = [], ror_dif  = [], rev_C_dif = [], rev_NG_dif = [],
        uu_1_C_dif = [], uu_2_C_dif = [], uu_3_C_dif = [], uu_4_C_dif = [], uu_1_NG_dif = [], uu_2_NG_dif = [], uu_3_NG_dif = [], uu_4_NG_dif = [],
            ramp_C_dif = [], ramp_NG_dif = [], g_h_C_dif = [], g_h_NG_dif = [], g_h_NGC_dif = [], prof_dif = [], gmm = []);

Then I append a vector of the same size using push!:

push!(GMM_data, Any[-464547.4242273845, 3.853048934599117e8, 6.594229905162156e8, 2.3667284934715833, 2.337350615008504, 2.2234660448464014, 0.767728159659241, 0.0, -0.0255604538388607, -0.021426232084953166, -0.006821965746661444, 0.013891513438316122, 0.0006265685948713331, 0.8547474974727632, 0.06211081603260709, 0.08826851897694173, 7.006229768868772e9, 464547.39567910414, -3.853049537341995e8, -6.594231521204182e8, -1.8634861495585215, -1.008591746762811, -1.1093025904745037, 0.0839979043036051, -0.46252218106964815, -0.15331777535252566, -0.04712077936233181, 0.053266307472369115, -0.00933061823377325, 0.0004751324744484023, -0.318557425856661, 0.29173434423850525, -0.007439262327100238, -7.006228930191487e9, 2.164807258930988e14])
1×35 DataFrame
 Row │ ror        rev_C      rev_NG     uu_1_C   uu_2_C   uu_3_C   uu_4_C    ⋯
     │ Any        Any        Any        Any      Any      Any      Any       ⋯
─────┼────────────────────────────────────────────────────────────────────────
   1 │ -464547.0  3.85305e8  6.59423e8  2.36673  2.33735  2.22347  0.767728  ⋯
                                                            28 columns omitted

So it works fine but if I define the empty dataframe with the columns as before and write it out:

CSV.write("/users/miguelborrero/Desktop/Energy_Transitions/Data/GMM_data.csv", GMM_data);

Then I read it in and perform the exact same push as before it gives a weird error:

GMM_data = CSV.read("/users/miguelborrero/Desktop/Energy_Transitions/Data/GMM_data.csv", DataFrame)
0×35 DataFrame

push!(GMM_data, Any[-464547.4242273845, 3.853048934599117e8, 6.594229905162156e8, 2.3667284934715833, 2.337350615008504, 2.2234660448464014, 0.767728159659241, 0.0, -0.0255604538388607, -0.021426232084953166, -0.006821965746661444, 0.013891513438316122, 0.0006265685948713331, 0.8547474974727632, 0.06211081603260709, 0.08826851897694173, 7.006229768868772e9, 464547.39567910414, -3.853049537341995e8, -6.594231521204182e8, -1.8634861495585215, -1.008591746762811, -1.1093025904745037, 0.0839979043036051, -0.46252218106964815, -0.15331777535252566, -0.04712077936233181, 0.053266307472369115, -0.00933061823377325, 0.0004751324744484023, -0.318557425856661, 0.29173434423850525, -0.007439262327100238, -7.006228930191487e9, 2.164807258930988e14])
┌ Error: Error adding value to column :ror.
└ @ DataFrames ~/.julia/packages/DataFrames/JZ7x5/src/dataframe/dataframe.jl:1719
ERROR: StackOverflowError:
Stacktrace:
 [1] append! at /Users/miguelborrero/.julia/packages/SentinelArrays/EQtMp/src/missingvector.jl:109 [inlined]
 [2] push!(::SentinelArrays.MissingVector, ::Float64) at ./array.jl:961
 ... (the last 2 lines are repeated 79982 more times)
 [159967] append! at /Users/miguelborrero/.julia/packages/SentinelArrays/EQtMp/src/missingvector.jl:109 [inlined]

How can the two processes not be the same and therefore how can the second process fail???

Thanks a lot in advance.

rocco_sprmnt21 · September 24, 2022, 6:10am

I am not able to reproduce your case exactly, but,waiting for someone to explain the exact reason for the error you get, you could try using a tuple (or a namedtuple) instead of the array{any} to add a line to the empty df.

mb96 · September 24, 2022, 6:38am

Thanks for your reply rocco. To reproduce my case just define an empty dataframe with two columns (eg x = , y = ) save the dataframe to a csv file. Read that same csv into a dataframe and try to push a row of floats (eg: push!(df, [1.2, 2.2]) and you should get the same error. But as you suggested I will try with tuple. Thanks again.

bkamins · September 24, 2022, 8:27am

When reading the file back use:

GMM_data = CSV.read(“/users/miguelborrero/Desktop/Energy_Transitions/Data/GMM_data.csv”, DataFrame, types=Float64)

as I assume you want to store floats in the columns.

The issue is that when reading back empty data frame CSV.read cannot infer eltype of columns, and assumes they allow only missing values. Therefore you need to pass a hint what is the eltype you want to accept.

mb96 · September 24, 2022, 3:31pm

Thanks a lot!

rocco_sprmnt21 · September 24, 2022, 3:56pm

Using a namedtuple is not enough (I can’t repeat the steps that led me to think this), but with kwarg cols you get past the type check block.

julia> GMM_data = DataFrame(ror = 1, rev_C = 2, rev_NG = 3, uu_1_C = 4)
1×4 DataFrame
 Row │ ror    rev_C  rev_NG  uu_1_C 
     │ Int64  Int64  Int64   Int64
─────┼──────────────────────────────
   1 │     1      2       3       4

julia> gmm_empty=empty(GMM_data)
0×4 DataFrame

julia> push!(gmm_empty, (ror=1.4,rev_C=1.2,rev_NG=1.24242,uu_1_C=1.242222))
┌ Error: Error adding value to column :ror.
└ @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1328
ERROR: InexactError: Int64(1.4)
Stacktrace:
 [1] Int64
   @ .\float.jl:788 [inlined]
 [2] convert
   @ .\number.jl:7 [inlined]
 [3] push!(a::Vector{Int64}, item::Float64)
   @ Base .\array.jl:1057
 [4] push!(df::DataFrame, row::NamedTuple{(:ror, :rev_C, :rev_NG, :uu_1_C), NTuple{4, Float64}}; cols::Symbol, promote::Bool)
   @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1310
 [5] push!(df::DataFrame, row::NamedTuple{(:ror, :rev_C, :rev_NG, :uu_1_C), NTuple{4, Float64}})
   @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1195
 [6] top-level scope
   @ c:\Users\sprmn\.julia\v1.8\dataframes21.jl:111

julia> push!(gmm_empty, (ror=1.4,rev_C=1.2,rev_NG=1.24242,uu_1_C=1.242222), cols=:union)
1×4 DataFrame
 Row │ ror      rev_C    rev_NG   uu_1_C  
     │ Float64  Float64  Float64  Float64
─────┼────────────────────────────────────
   1 │     1.4      1.2  1.24242  1.24222

On the other hand, if you use the types that are “right” or that can be promoted “well”, the new rowis accepted.

julia> df=empty(DataFrame(x=1., y=2))
0×2 DataFrame

julia> push!(df, [1.1, 2.0])
1×2 DataFrame
 Row │ x        y     
     │ Float64  Int64
─────┼────────────────
   1 │     1.1      2

julia> df=empty(DataFrame(x=1., y=2))
0×2 DataFrame

julia> push!(df, (1.1, 2.0))
1×2 DataFrame
 Row │ x        y     
     │ Float64  Int64
─────┼────────────────
   1 │     1.1      2

julia> df=empty(DataFrame(x=1., y=2))
0×2 DataFrame

julia> push!(df, (x=1.1,y=2.1))
┌ Error: Error adding value to column :y.
└ @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1328
ERROR: InexactError: Int64(2.1)
Stacktrace:
 [1] Int64
   @ .\float.jl:788 [inlined]
 [2] convert
   @ .\number.jl:7 [inlined]
 [3] push!(a::Vector{Int64}, item::Float64)
   @ Base .\array.jl:1057
 [4] push!(df::DataFrame, row::NamedTuple{(:x, :y), Tuple{Float64, Float64}}; cols::Symbol, promote::Bool)
   @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1310
 [5] push!(df::DataFrame, row::NamedTuple{(:x, :y), Tuple{Float64, Float64}})
   @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:1195
 [6] top-level scope
   @ c:\Users\sprmn\.julia\v1.8\dataframes21.jl:89

julia> df=empty(DataFrame(x=1., y=2))
0×2 DataFrame

julia> push!(df, (x=1.1,y=2.5), cols=:union)
1×2 DataFrame
 Row │ x        y       
     │ Float64  Float64
─────┼──────────────────
   1 │     1.1      2.5

bkamins · September 24, 2022, 5:29pm

As was commented in Allow `Any` type in column · Issue #1027 · JuliaData/CSV.jl · GitHub the alternative solution is to add promote=true kwarg:

push!(df, [ some data ...], promote=true)

Topic		Replies	Views
Issue adding a row record of a DataFrame with `String` name to itself General Usage dataframes	5	1084	March 20, 2022
How do you edit a DataFrame after reading it from a CSV? Data	6	1033	March 1, 2021
Appending rows to a dataframe is seemingly inconsistent and confusing Data	11	4718	December 24, 2021
Cannot insert the row to the data frame New to Julia dataframes	2	528	September 29, 2021
Julia appending data to dataframe gives dataframe not defined New to Julia	1	1311	May 27, 2019

Help with appending row in read in DataFrame (weird behavior)

Related topics