Special case handling - alternatives to if/else?


I have an unruly and monster data table from collaborators, and I’m writing a script to clean it up. The table is updated on a semi-regular basis, and there’s no option to get it in a better form, so I’m stuck trying to write a workaround. The table has a number of subtables, which are indicated in the header, so for example I might have

A1 A2 A3 B1 B2 Z5
foo 1.3 “a”
foo 2.4 “b”
bar 1.7 “a”
baz 3.3 “a”
foo 8.2
foo 9.2
baz 10.

And I need to process all the A columns together, all the B columns together etc. So I’ve got code that splits the table by the header, does some processing and then converts things to long form etc.

All of this works great - the question comes from the fact that, while 80% of the subtables can be processed with the same code, about 20% of them need some idiosyncratic stuff. I’ve started off doing:

const special_cases = Set(["B", "E", "Q"])

function customprocess!(table, parent)
    !in(parent, special_cases) && return table

    if parent == "B"
        # Custom code for B
    elseif parent == "E"
        # Custom code for E
    elseif parent == "Q"
        # Custom code for Q
    return table

This works, but as my number of special cases is increasing, this feels very unwieldy. And this is a project where the table will keep getting updated and the script will keep getting used potentially for years, so I’d like something that’s a bit more maintainable than a giant if-else. Is there a good way to use multiple dispatch or some other sort of design pattern to make the code a bit more readable/maintainable?

I was thinking something like

abstract type AbstractParentTable end

struct ParentB <: AbstractParentTable end
struct ParentE <: AbstractParentTable end
struct ParentQ <: AbstractParentTable end

And then define different methods for these, but I wasn’t sure how to get a type of ParentB (for example) from the "B" in the header. Other ideas welcome.



Could you use a Dict that maps "B" to ParentB?



Yeah, that’s certainly doable. I could make special_cases a dict and then do haskey() instead of in().

Can’t tell if that’s actually simpler though… I guess keeping the function definitions separate is still nicer. Would like a way to do it more dynamically though, maybe with a macro?



It looks like a use-case for Value types: https://docs.julialang.org/en/v1/manual/types/#“Value-types”-1



Yes, this is exactly what I thought, too.

To elaborate on the idea, would something like this work for you?

# Value type
struct Col{T}
Col(s::String) = Col{Symbol(s)}()

# Default behaviour: do nothing
process!(table, ::Col) = nothing

# Custom behaviour for column header "A": add 10 to every cell
process!(table, ::Col{:A}) = foreach(eachindex(table)) do i
    table[i] += 10
julia> # Sample data
       table = Dict("A" => [1,2,3], 
                    "B" => [4,5,6])
Dict{String,Array{Int64,1}} with 2 entries:
  "B" => [4, 5, 6]
  "A" => [1, 2, 3]

julia> # Process everything
       for key in keys(table)
           process!(table[key], Col(key))

julia> table
Dict{String,Array{Int64,1}} with 2 entries:
  "B" => [4, 5, 6]
  "A" => [11, 12, 13]


Oh yeah - this looks like exactly what I need, thanks! (to @kirtsar too). I’ll check out the docs and give that a spin.