I have an unruly and monster data table from collaborators, and I’m writing a script to clean it up. The table is updated on a semi-regular basis, and there’s no option to get it in a better form, so I’m stuck trying to write a workaround. The table has a number of subtables, which are indicated in the header, so for example I might have
A1 | A2 | A3 | B1 | B2 | … | Z5 | |
---|---|---|---|---|---|---|---|
foo | 1.3 | “a” | |||||
foo | 2.4 | “b” | |||||
bar | 1.7 | “a” | |||||
baz | 3.3 | “a” | |||||
foo | 8.2 | ||||||
foo | 9.2 | ||||||
baz | 10. | ||||||
… | … | … | … | … | … | … | |
“stuff” |
And I need to process all the A
columns together, all the B
columns together etc. So I’ve got code that splits the table by the header, does some processing and then converts things to long form etc.
All of this works great - the question comes from the fact that, while 80% of the subtables can be processed with the same code, about 20% of them need some idiosyncratic stuff. I’ve started off doing:
const special_cases = Set(["B", "E", "Q"])
function customprocess!(table, parent)
!in(parent, special_cases) && return table
if parent == "B"
# Custom code for B
elseif parent == "E"
# Custom code for E
elseif parent == "Q"
# Custom code for Q
end
return table
end
This works, but as my number of special cases is increasing, this feels very unwieldy. And this is a project where the table will keep getting updated and the script will keep getting used potentially for years, so I’d like something that’s a bit more maintainable than a giant if-else. Is there a good way to use multiple dispatch or some other sort of design pattern to make the code a bit more readable/maintainable?
I was thinking something like
abstract type AbstractParentTable end
struct ParentB <: AbstractParentTable end
struct ParentE <: AbstractParentTable end
struct ParentQ <: AbstractParentTable end
And then define different methods for these, but I wasn’t sure how to get a type of ParentB
(for example) from the "B"
in the header. Other ideas welcome.