Compare data from two dataframes

I have two dataframes as shown below and I need to create csv file or json output like shown below.

I need to get dominant row for each product for each date from two dataframes then compare them to generate below output.

Ex: df has product “ABCDEF” with categories “123” and “456” for month of May 2024 and “total” value for them respectively is “15” and “25” since 25 is higher my dominant category for product “ABCDEF” is “456”.

In similar lines df2 has product “ABCDEF” with categories “232” and “456” for month of May 2024 and “total” value for them respectively is “38” and “25” since 38 is higher my dominant category for product “ABCDEF” is “232”.

As we have dominant row from both dataframes we need to compare these two rows to find some differences respective to product, category and total fields. We need to display output as shown below with three rows for each product for each month.

product,Changetype,From_052024,To_052024
ABCDEF,Category,456(25),232(38)
ABCDEF,product,expired,new4
ABCDEF,total,63,40

In the first row above we will show dominant category from df and it’s count(value from total column) in the brackets under from section similarly under to section we show dominant category from df2 and it’s count in the brackets.

In the second section we are comparing category in df and df2 if they are not equal then we will mark “from_” column as expired and “to_” column as new.

In third row we are printing “total” for that “product” irrespective of “category” for that month.

(or)

JSON response can be below format:

[
  {
    "product": "ABCDEF",
    "Changetype": "Category",
    "From_May-2024": "456(25)",
    "To_May-2024": "232(28)",
    "From_Sep-2024": "xxx(xxx)",
    "To_Sep-2024": "xxx(xx)"
  },
  {
    "product": "ABCDEF",
    "Changetype": "product",
    "From_May-2024": "expired",
    "To_May-2024": "new",
    "From_Sep-2024": "xxx",
    "To_Sep-2024": "xxx"
  },
  {
    "product": "ABCDEF",
    "Changetype": "total",
    "From_May-2024": "63",
    "To_May-2024": "40",
    "From_Sep-2024": "xxx",
    "To_Sep-2024": "xxx"
  }
]
julia> df
5×4 DataFrame
 Row │ product  category  dte     total 
     │ String   String    Int64   Int64 
─────┼──────────────────────────────────
   1 │ ABCDEF   123       202405     15 
   2 │ XYZDEF   256       202403     20 
   3 │ ABCDEF   456       202405     25 
   4 │ WERTYU   789       202404     16 
   5 │ XYZDEF   261       202403     18
julia> df2
5×4 DataFrame
 Row │ product  category  dte     total 
     │ String   String    Int64   Int64 
─────┼──────────────────────────────────
   1 │ ABCDEF   232       202405     38
   2 │ XYZDEF   256       202403     20
   3 │ ABCDEF   456       202405     25
   4 │ WERTYU   789       202404     16
   5 │ XYZDEF   272       202403     48

What is your question? What did you try so far?

My apologies as I forgot to put up note. I know that we could achieve this using dictionaries by writing multiple for loops and trying to check if this is the efficient approach or if we have any better solution to this. I am trying to avoid writing loops as my df can be huge and would slow down my app response.

I’m not sure I fully understand your question but I think you are looking for leftjoin followed by a groupby over product.