Insert dummy 0 value rows into dataframe

Hello All,

I’m trying to figure out a quick way to insert dummy rows with 0 values if they are missing for a set of index fields

For example, let’s say I have this dataframe:


data = DataFrame(:index => String[], :metric => String[], :values => Int64[])

push!(data, ["a", "x", 1])
push!(data, ["a", "y", 2])
push!(data, ["b", "x", 1])


julia> data
3×3 DataFrame
 Row │ index   metric  values 
     │ String  String  Int64  
─────┼────────────────────────
   1 │ a       x            1
   2 │ a       y            2
   3 │ b       x            1

For this table, I have metrics x,y for index A, but only metric X for index B. Is there a simple way to populate this table with index b, metric y, value = 0?

Of course, in my real table, there are thousands/millions of rows, so I don’t want to loop through these values one at a time to insert them. In my current program, I pivot the data wide, insert 0 values where they are missing and then depivot it back, but not sure if there is an easy way to do that/better way to do that in Julia DataFrames.

Thanks.

There is a keyword argument cols in push! where you can do cols = :union do add new columns.

However you can’t control what the value will be, so you will need to do coalesce after your loop.

Sorry I’m not quite sure what you mean by this. How can I use union in this case?

What you want is https://github.com/JuliaData/DataFrames.jl/pull/1864 but it was never finished. If you would find this functionality useful I can re-write this PR anew and add it.

It does sound like what I want. I’d use the function if it existed.

Thanks

For now you can have a look at the internals of the implementation - they should be relatively fast (but I do not guarantee it as it was written a long time ago).

Please read the docstring for push! with ?push. Here is an MWE that points you in the right direction.

julia> data = DataFrame(:index => String[], :metric => String[], :values => Int64[])
0×3 DataFrame

julia> datas = [(index = "a", metric = "x"), (index = "a", values = 1)];

julia> for d in datas
           push!(data, d; cols = :union)
       end

julia> data
2×3 DataFrame
 Row │ index   metric   values  
     │ String  String?  Int64?  
─────┼──────────────────────────
   1 │ a       x        missing 
   2 │ a       missing        1

My solution, (when columns are categorical unique should be changed to levels)

d = DataFrame([(index = i, metric =  m) 
                            for i in unique(data.index)
                            for m in unique(data.metric)])

leftjoin(d, data, on =[:index, :metric])