Adding seed row to empty dataframe then updating it

I seem to be missing something here. I thought I understood how push works but obviously not. Can someone help please.

I have a dataframe defined

describe(df_dash_table)
9×7 DataFrame
 Row │ variable  mean    min      median   max      nmissing  eltype   
     │ Symbol    Union…  Nothing  Nothing  Nothing  Int64     DataType 
─────┼─────────────────────────────────────────────────────────────────
   1 │ sym                                                 0  String
   2 │ price     NaN                                       0  Float64
   3 │ sdmove    NaN                                       0  Float64
   4 │ hv20      NaN                                       0  Float64
   5 │ hv10      NaN                                       0  Float64
   6 │ hv5       NaN                                       0  Float64
   7 │ iv        NaN                                       0  Float64
   8 │ iv%ile    NaN                                       0  Float64
   9 │ prc%ile   NaN                                       0  Float64

In the code below a ZMQ message comes in and is parsed into 3 components.

message = "IND~SPX~LAST~4382.46"
source,sym_in,field_in ,value_in  = split( message , "~")

and I want to update the df_dash_table using the components like this.

df[findfirst(==(sym_in),df.id),findfirst(==(field_in),names(df))] = value_in

this won’t work because sym_in ( in this case “SPX”) doesn’t exist and so I want to add a seed row so I can update it. So the seed row would be

"SPX" , 0.0,    0.0,    0.0 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 

using

push!(df_dash_table,       (sym_in, 0.0,    0.0,    0.0 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 )   )

the update the new row

df[findfirst(==(sym_in),df_dash_table.sym),findfirst(==(field_in),names(df_dash_table))] = value_in

but this yields the error

 Error: Error adding value to column :price.
└ @ DataFrames ~/.julia/packages/DataFrames/zqFGs/src/dataframe/dataframe.jl:1719
ERROR: AssertionError: length(col) == targetrows
Stacktrace:
 [1] push!(df::DataFrame, row::Tuple{SubString{String}, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}; promote::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/zqFGs/src/dataframe/dataframe.jl:1712
 [2] push!(df::DataFrame, row::Tuple{SubString{String}, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64})
   @ DataFrames ~/.julia/packages/DataFrames/zqFGs/src/dataframe/dataframe.jl:1680
 [3] top-level scope
   @ REPL[12]:7

I don’t see why? When I read the push docs it seems to me that I should be able to be able to add the seed row. What am I doing wrong and is there a better way to handle this?

using DataFrames

dash_columns = ["sym","price","sdmove","hv20","hv10","hv5","iv","iv%ile","prc%ile"]

df_dash_table = DataFrame(fill( Float64[],length( dash_columns[2:end] )), dash_columns[2:end]; copycols=false)

insertcols!(df_dash_table,1, :sym => "")

message = "IND~SPX~PRICE~4382.46"

source,sym_in,field_in ,value_in  = split( message , "~")

push!(df,       (sym_in, 0.0,    0.0,    0.0 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 )   

df_dash_table[findfirst(==(sym_in),df_dash_table.sym),findfirst(==(field_in),names(df))] = value_in

the usecase is that the next message to come in might be

message =  "IND~SPX~HV20~17.76"

so df_dash_table would go from

"sym","price","sdmove","hv20","hv10","hv5","iv","iv%ile","prc%ile"
"SPX", 4382.46,    0.0,    0.0 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 

TO ( updating “SPX” and “hv20” with 17.76)

"sym","price","sdmove","hv20","hv10","hv5","iv","iv%ile","prc%ile"
"SPX", 4382.46,    0.0,    17.76 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 

This message means that you have aliases in your columns. Most likely you have incorrectly initialized the data frame. Here is an example:

julia> df = DataFrame(a=1)
1×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> df.b = df.a
1-element Vector{Int64}:
 1

julia> df
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      1

julia> push!(df, (2, 2))
┌ Error: Error adding value to column :a.
└ @ DataFrames C:\Users\bogum\.julia\packages\DataFrames\zqFGs\src\dataframe\dataframe.jl:1719
ERROR: AssertionError: length(col) == targetrows

Here is a way to find which columns are aliases:

julia> d = IdDict()
IdDict{Any, Any}()

julia> for n in names(df)
           col = df[!, n]
           if haskey(d, col)
               push!(d[col], n)
           else
               d[col] = [n]
           end
       end

julia> d
IdDict{Any, Any} with 2 entries:
  [1] => ["a", "b"]
  [1] => ["c"]

The simplest way to de-alias is co make a copy:

julia> df = copy(df)
1×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1

julia> d = IdDict()
IdDict{Any, Any}()

julia> for n in names(df)
           col = df[!, n]
           if haskey(d, col)
               push!(d[col], n)
           else
               d[col] = [n]
           end
       end

julia> d
IdDict{Any, Any} with 3 entries:
  [1] => ["b"]
  [1] => ["a"]
  [1] => ["c"]

Side note: if you show me how you create your data frame I will tell you the reason of the problem.

Hi there Professor
I included the “code” in the OP but I think I obfuscated it with too much “detail”. Sorry. I am still battling with a few issues with Dataframes.jl and the creation of the dataframe is one of them I couldn’t figure out how to put in the “sym” = String into the dataframe create line.

using DataFrames

dash_columns = ["sym","price","sdmove","hv20","hv10","hv5","iv","iv%ile","prc%ile"]

df_dash_table = DataFrame(fill( Float64[],length( dash_columns[2:end] )), dash_columns[2:end]; copycols=false)

insertcols!(df_dash_table,1, :sym => "")

message = "IND~SPX~PRICE~4382.46"

source,sym_in,field_in ,value_in  = split( message , "~")

push!(df,       (sym_in, 0.0,    0.0,    0.0 ,  0.0,  0.0 , 0.0   , 0.0 , 0.0 )   

df_dash_table[findfirst(==(sym_in),df_dash_table.sym),findfirst(==(field_in),names(df))] = value_in

Using fill the way you do creates aliases:

julia> x = fill([], 2)
2-element Vector{Vector{Any}}:
 []
 []

julia> x[1] === x[2]
true

The simplest way to avoid the problem is to remove copycols=false in your code and all will be OK. DataFrame constructor copies data by default to avoid problems like you encountered.

Now an approach to avoid aliases in the first place could be the following. The simplest is to use a matrix constructor:

df_dash_table = DataFrame(fill(Float64, 0, length(dash_columns)-1), dash_columns[2:end])

another approach would be to use a comprehension:

df_dash_table = DataFrame([Float64[] for _ in 1:length(dash_columns)-1], dash_columns[2:end])

finally you could do:

df_dash_table = DataFrame([col => (col == "sym" ? String : Float64)[] for col in dash_columns])

to create a whole data frame in one shot.

1 Like

thank you again. As always there is so much to unpack in your answers. I have so much to learn about this language. I am setting off, next week, running through your tutorials so I can stop bothering you with these ridiculous mistakes.

I’m going to use the below as it makes the most sense to me.

df_dash_table = DataFrame([col => (col == "sym" ? String : Float64)[] for col in dash_columns])

I am also find this Rosetta stone incredibly useful to ease my transition from python to julia. I’ll get there in the end :slight_smile:
thank you again.