I have two dataframes as shown below and I need to create csv file or json output like shown below.
I need to get dominant row for each product for each date from two dataframes then compare them to generate below output.
Ex: df has product “ABCDEF” with categories “123” and “456” for month of May 2024 and “total” value for them respectively is “15” and “25” since 25 is higher my dominant category for product “ABCDEF” is “456”.
In similar lines df2 has product “ABCDEF” with categories “232” and “456” for month of May 2024 and “total” value for them respectively is “38” and “25” since 38 is higher my dominant category for product “ABCDEF” is “232”.
As we have dominant row from both dataframes we need to compare these two rows to find some differences respective to product, category and total fields. We need to display output as shown below with three rows for each product for each month.
product,Changetype,From_052024,To_052024
ABCDEF,Category,456(25),232(38)
ABCDEF,product,expired,new4
ABCDEF,total,63,40
In the first row above we will show dominant category from df and it’s count(value from total column) in the brackets under from section similarly under to section we show dominant category from df2 and it’s count in the brackets.
In the second section we are comparing category in df and df2 if they are not equal then we will mark “from_” column as expired and “to_” column as new.
In third row we are printing “total” for that “product” irrespective of “category” for that month.
(or)
JSON response can be below format:
[
{
"product": "ABCDEF",
"Changetype": "Category",
"From_May-2024": "456(25)",
"To_May-2024": "232(28)",
"From_Sep-2024": "xxx(xxx)",
"To_Sep-2024": "xxx(xx)"
},
{
"product": "ABCDEF",
"Changetype": "product",
"From_May-2024": "expired",
"To_May-2024": "new",
"From_Sep-2024": "xxx",
"To_Sep-2024": "xxx"
},
{
"product": "ABCDEF",
"Changetype": "total",
"From_May-2024": "63",
"To_May-2024": "40",
"From_Sep-2024": "xxx",
"To_Sep-2024": "xxx"
}
]
julia> df
5×4 DataFrame
Row │ product category dte total
│ String String Int64 Int64
─────┼──────────────────────────────────
1 │ ABCDEF 123 202405 15
2 │ XYZDEF 256 202403 20
3 │ ABCDEF 456 202405 25
4 │ WERTYU 789 202404 16
5 │ XYZDEF 261 202403 18
julia> df2
5×4 DataFrame
Row │ product category dte total
│ String String Int64 Int64
─────┼──────────────────────────────────
1 │ ABCDEF 232 202405 38
2 │ XYZDEF 256 202403 20
3 │ ABCDEF 456 202405 25
4 │ WERTYU 789 202404 16
5 │ XYZDEF 272 202403 48