Identify and add missing rows dynamically in dataframe

I have scenario where I need to insert the missing data into dataframe dynamically.

3×3 DataFrame
 Row │ month   ac_hrs  crew_hrs 
     │ String  Int64   Int64    
─────┼──────────────────────────
   1 │ 202404       2         4
   2 │ 202405       4         6
   3 │ 202406       5         7
julia> permutedims(df, "month")
2×4 DataFrame
 Row │ month     202404  202405  202406 
     │ String    Int64   Int64   Int64  
─────┼──────────────────────────────────
   1 │ ac_hrs         2       4       5
   2 │ crew_hrs       4       6       7

I have used permutedims to get the data needed but I need to insert default data(0) for any months that is missing for eg. shown in above df we are missing data for months 202401, 202402, 202403

Why did you need to do permutedims?

A good strategy for this task in general is to have a second data frame that is “complete” and then use that as the base and update it

julia> using DataFrames, DataFramesMeta

julia> df = DataFrame(day = [1, 4], val = [100, 300]);

julia> df_complete = DataFrame(day = [1, 2, 3, 4], default_val = 0);

julia> new_df = leftjoin(df_complete, df, on = :day)
4×3 DataFrame
 Row │ day    default_val  val     
     │ Int64  Int64        Int64?  
─────┼─────────────────────────────
   1 │     1            0      100
   2 │     4            0      300
   3 │     2            0  missing 
   4 │     3            0  missing 

julia> @rtransform new_df :val = ismissing(:val) ? :default_val : :val
4×3 DataFrame
 Row │ day    default_val  val   
     │ Int64  Int64        Int64 
─────┼───────────────────────────
   1 │     1            0    100
   2 │     4            0    300
   3 │     2            0      0
   4 │     3            0      0

Another way forward (quite similar to pdeffebach suggestion):

julia> df = DataFrame(month=["202404","202405","202406"], ac_hrs=[2,4,5], crew_hrs=[4,6,7])
3×3 DataFrame
 Row │ month   ac_hrs  crew_hrs 
     │ String  Int64   Int64    
─────┼──────────────────────────
   1 │ 202404       2         4
   2 │ 202405       4         6
   3 │ 202406       5         7

julia> coalesce.(leftjoin(DataFrame(month=["2024$(lpad(i,2,'0'))" for i in 1:6]),df;on=:month, order=:left),0)
6×3 DataFrame
 Row │ month   ac_hrs  crew_hrs 
     │ String  Int64   Int64    
─────┼──────────────────────────
   1 │ 202401       0         0
   2 │ 202402       0         0
   3 │ 202403       0         0
   4 │ 202404       2         4
   5 │ 202405       4         6
   6 │ 202406       5         7

This is not the exact OP result, but makes it easy to get there and may be enough as it is.


dfOP=permutedims(df,"month")
rm=1:parse(Int,names(dfOP)[2][end-1:end])-1
y=names(dfOP)[2][1:4]

insertcols!(dfOP,2,((y.*lpad.(rm,2,'0')).=>0)...)

I’m not sure if the operation makes sense, but I report the following result.

julia> df
3×3 DataFrame
 Row │ month   ac_hrs  crew_hrs 
     │ String  Int64   Int64
─────┼──────────────────────────
   1 │ 202404       2         4
   2 │ 202405       4         6
   3 │ 202406       5         7

julia> 

julia> dfOP=permutedims(df,"ac_hrs")
ERROR: ArgumentError: all elements of src_namescol must support conversion to String
Stacktrace:

What does the ERROR that not all elements of the column support conversion to string mean?

@Dan This works fine but resulted df changes my column types to “Any” instead of “Float64” which is causing the issues for me, any suggestion on how can I convert them back to Float64?

Row │ month   ac_hrs  crew_hrs 
     │ String  Any   Any    
─────┼──────────────────────────
   1 │ 202404       2         4
   2 │ 202405       4         6

In your post the columns are Ints, not Float64s. So the 0 in the coalesce makes the columns Any. Try to replace the 0 with 0.0 and it should remain Float64.