Say if I have two DataFrames:
df2. They have unique values but have the same column names:
Product. As I move forward with my data processing, sometimes, I want to harmonize the DataFrames using the
Cost via a join and then run a specific set of analyses for this harmonization. Other times, I want to harmonize the DataFrames across
Product via a join and run a different set of analyses. What would be the best approach to handle different analyses for manipulating these datasets?
My thought was if I want to utilize multiple dispatch for this analysis I would:
- Create one function called
harmonizewith the following arguments:
- To accomplish different analyses as stated above, I would then dispatch on
harmonizewith these two dispatches (Please see Edit 1 for more details on these functions):
harmonize(df_1, df_2 ; analysis::Symbol = :CustomerCost)
harmonize(df_1, df_2 ; analysis::Symbol = :CustomerProduct)
So far, this is working for my analysis. However, I was wondering if this is an abuse of multiple dispatch or an incorrect way of thinking of using multiple dispatch for data science. What are people’s thoughts on my proposed pipeline for analysis?
P.S. If you want me to add any more clarity/information to this post, let me know. I somewhat struggled to articulate what I was trying to say here.
The two functions would be dispatches that look like this:
function harmonize(df_1, df_2 ; analysis::Symbol = :CustomerCost) # Code which does the analysis for a Customer and Cost join end
function harmonize(df_1, df_2 ; analysis::Symbol = :CustomerProduct) # Code which does the analysis for a Customer and Product join end
In this example, there is no if-else logic happening.