I have scenario where I need to insert the missing data into dataframe dynamically.
3×3 DataFrame
Row │ month ac_hrs crew_hrs
│ String Int64 Int64
─────┼──────────────────────────
1 │ 202404 2 4
2 │ 202405 4 6
3 │ 202406 5 7
julia> permutedims(df, "month")
2×4 DataFrame
Row │ month 202404 202405 202406
│ String Int64 Int64 Int64
─────┼──────────────────────────────────
1 │ ac_hrs 2 4 5
2 │ crew_hrs 4 6 7
I have used permutedims to get the data needed but I need to insert default data(0) for any months that is missing for eg. shown in above df we are missing data for months 202401, 202402, 202403
Why did you need to do permutedims
?
A good strategy for this task in general is to have a second data frame that is “complete” and then use that as the base and update it
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(day = [1, 4], val = [100, 300]);
julia> df_complete = DataFrame(day = [1, 2, 3, 4], default_val = 0);
julia> new_df = leftjoin(df_complete, df, on = :day)
4×3 DataFrame
Row │ day default_val val
│ Int64 Int64 Int64?
─────┼─────────────────────────────
1 │ 1 0 100
2 │ 4 0 300
3 │ 2 0 missing
4 │ 3 0 missing
julia> @rtransform new_df :val = ismissing(:val) ? :default_val : :val
4×3 DataFrame
Row │ day default_val val
│ Int64 Int64 Int64
─────┼───────────────────────────
1 │ 1 0 100
2 │ 4 0 300
3 │ 2 0 0
4 │ 3 0 0
Dan
April 8, 2024, 11:22pm
3
Another way forward (quite similar to pdeffebach suggestion):
julia> df = DataFrame(month=["202404","202405","202406"], ac_hrs=[2,4,5], crew_hrs=[4,6,7])
3×3 DataFrame
Row │ month ac_hrs crew_hrs
│ String Int64 Int64
─────┼──────────────────────────
1 │ 202404 2 4
2 │ 202405 4 6
3 │ 202406 5 7
julia> coalesce.(leftjoin(DataFrame(month=["2024$(lpad(i,2,'0'))" for i in 1:6]),df;on=:month, order=:left),0)
6×3 DataFrame
Row │ month ac_hrs crew_hrs
│ String Int64 Int64
─────┼──────────────────────────
1 │ 202401 0 0
2 │ 202402 0 0
3 │ 202403 0 0
4 │ 202404 2 4
5 │ 202405 4 6
6 │ 202406 5 7
This is not the exact OP result, but makes it easy to get there and may be enough as it is.
dfOP=permutedims(df,"month")
rm=1:parse(Int,names(dfOP)[2][end-1:end])-1
y=names(dfOP)[2][1:4]
insertcols!(dfOP,2,((y.*lpad.(rm,2,'0')).=>0)...)
I’m not sure if the operation makes sense, but I report the following result.
julia> df
3×3 DataFrame
Row │ month ac_hrs crew_hrs
│ String Int64 Int64
─────┼──────────────────────────
1 │ 202404 2 4
2 │ 202405 4 6
3 │ 202406 5 7
julia>
julia> dfOP=permutedims(df,"ac_hrs")
ERROR: ArgumentError: all elements of src_namescol must support conversion to String
Stacktrace:
What does the ERROR that not all elements of the column support conversion to string mean?
@Dan This works fine but resulted df changes my column types to “Any” instead of “Float64” which is causing the issues for me, any suggestion on how can I convert them back to Float64?
Row │ month ac_hrs crew_hrs
│ String Any Any
─────┼──────────────────────────
1 │ 202404 2 4
2 │ 202405 4 6
Dan
April 9, 2024, 11:25pm
7
In your post the columns are Ints, not Float64s. So the 0
in the coalesce
makes the columns Any. Try to replace the 0
with 0.0
and it should remain Float64.