Hi all,
Say if I have two DataFrames: df1
and df2
. They have unique values but have the same column names: Cost
, Customer
, and Product
. As I move forward with my data processing, sometimes, I want to harmonize the DataFrames using the Customer
and Cost
via a join and then run a specific set of analyses for this harmonization. Other times, I want to harmonize the DataFrames across Customer
and Product
via a join and run a different set of analyses. What would be the best approach to handle different analyses for manipulating these datasets?
My thought was if I want to utilize multiple dispatch for this analysis I would:
- Create one function called
harmonize
with the following arguments:harmonize(df_1, df_2)
- To accomplish different analyses as stated above, I would then dispatch on
harmonize
with these two dispatches (Please see Edit 1 for more details on these functions):harmonize(df_1, df_2 ; analysis::Symbol = :CustomerCost)
harmonize(df_1, df_2 ; analysis::Symbol = :CustomerProduct)
So far, this is working for my analysis. However, I was wondering if this is an abuse of multiple dispatch or an incorrect way of thinking of using multiple dispatch for data science. What are people’s thoughts on my proposed pipeline for analysis?
Thank you!
~ tcp
P.S. If you want me to add any more clarity/information to this post, let me know. I somewhat struggled to articulate what I was trying to say here.
Edit 1:
The two functions would be dispatches that look like this:
function harmonize(df_1, df_2 ; analysis::Symbol = :CustomerCost)
# Code which does the analysis for a Customer and Cost join
end
and
function harmonize(df_1, df_2 ; analysis::Symbol = :CustomerProduct)
# Code which does the analysis for a Customer and Product join
end
In this example, there is no if-else logic happening.