I have an unruly and monster data table from collaborators, and I’m writing a script to clean it up. The table is updated on a semi-regular basis, and there’s no option to get it in a better form, so I’m stuck trying to write a workaround. The table has a number of subtables, which are indicated in the header, so for example I might have
And I need to process all the
A columns together, all the
B columns together etc. So I’ve got code that splits the table by the header, does some processing and then converts things to long form etc.
All of this works great - the question comes from the fact that, while 80% of the subtables can be processed with the same code, about 20% of them need some idiosyncratic stuff. I’ve started off doing:
const special_cases = Set(["B", "E", "Q"]) function customprocess!(table, parent) !in(parent, special_cases) && return table if parent == "B" # Custom code for B elseif parent == "E" # Custom code for E elseif parent == "Q" # Custom code for Q end return table end
This works, but as my number of special cases is increasing, this feels very unwieldy. And this is a project where the table will keep getting updated and the script will keep getting used potentially for years, so I’d like something that’s a bit more maintainable than a giant if-else. Is there a good way to use multiple dispatch or some other sort of design pattern to make the code a bit more readable/maintainable?
I was thinking something like
abstract type AbstractParentTable end struct ParentB <: AbstractParentTable end struct ParentE <: AbstractParentTable end struct ParentQ <: AbstractParentTable end
And then define different methods for these, but I wasn’t sure how to get a type of
ParentB (for example) from the
"B" in the header. Other ideas welcome.