Special case handling - alternatives to if/else?

kevbonham · March 7, 2019, 10:14pm

I have an unruly and monster data table from collaborators, and I’m writing a script to clean it up. The table is updated on a semi-regular basis, and there’s no option to get it in a better form, so I’m stuck trying to write a workaround. The table has a number of subtables, which are indicated in the header, so for example I might have

A1	A2	A3	B1	B2	…	Z5
foo	1.3	“a”
foo	2.4	“b”
bar	1.7	“a”
baz	3.3	“a”
			foo	8.2
			foo	9.2
			baz	10.
…	…	…	…	…	…	…
						“stuff”

And I need to process all the A columns together, all the B columns together etc. So I’ve got code that splits the table by the header, does some processing and then converts things to long form etc.

All of this works great - the question comes from the fact that, while 80% of the subtables can be processed with the same code, about 20% of them need some idiosyncratic stuff. I’ve started off doing:

const special_cases = Set(["B", "E", "Q"])

function customprocess!(table, parent)
    !in(parent, special_cases) && return table

    if parent == "B"
        # Custom code for B
    elseif parent == "E"
        # Custom code for E
    elseif parent == "Q"
        # Custom code for Q
    end
    return table
end

This works, but as my number of special cases is increasing, this feels very unwieldy. And this is a project where the table will keep getting updated and the script will keep getting used potentially for years, so I’d like something that’s a bit more maintainable than a giant if-else. Is there a good way to use multiple dispatch or some other sort of design pattern to make the code a bit more readable/maintainable?

I was thinking something like

abstract type AbstractParentTable end

struct ParentB <: AbstractParentTable end
struct ParentE <: AbstractParentTable end
struct ParentQ <: AbstractParentTable end

And then define different methods for these, but I wasn’t sure how to get a type of ParentB (for example) from the "B" in the header. Other ideas welcome.

yurivish · March 7, 2019, 10:22pm

Could you use a Dict that maps "B" to ParentB?

kevbonham · March 7, 2019, 11:10pm

Yeah, that’s certainly doable. I could make special_cases a dict and then do haskey() instead of in().

Can’t tell if that’s actually simpler though… I guess keeping the function definitions separate is still nicer. Would like a way to do it more dynamically though, maybe with a macro?

kirtsar · March 7, 2019, 11:25pm

It looks like a use-case for Value types: Types · The Julia Language

ffevotte · March 7, 2019, 11:31pm

Yes, this is exactly what I thought, too.

To elaborate on the idea, would something like this work for you?

# Value type
struct Col{T}
end
Col(s::String) = Col{Symbol(s)}()

# Default behaviour: do nothing
process!(table, ::Col) = nothing

# Custom behaviour for column header "A": add 10 to every cell
process!(table, ::Col{:A}) = foreach(eachindex(table)) do i
    table[i] += 10
end

julia> # Sample data
       table = Dict("A" => [1,2,3], 
                    "B" => [4,5,6])
Dict{String,Array{Int64,1}} with 2 entries:
  "B" => [4, 5, 6]
  "A" => [1, 2, 3]

julia> # Process everything
       for key in keys(table)
           process!(table[key], Col(key))
       end

julia> table
Dict{String,Array{Int64,1}} with 2 entries:
  "B" => [4, 5, 6]
  "A" => [11, 12, 13]

kevbonham · March 7, 2019, 11:49pm

Oh yeah - this looks like exactly what I need, thanks! (to @kirtsar too). I’ll check out the docs and give that a spin.

Topic		Replies	Views
If, elseif, else vs Case When New to Julia question	5	623	September 10, 2021
Understanding Exceptional Cases and Exception Handling and If Statements Offtopic	6	797	July 22, 2021
Julia 2.0: superSwitch or SuperCase ... or generalized if-then-else statement Internals & Design syntax , control-flow	3	2260	June 25, 2021
Best way to to handle DataFrames and write legible code General Usage	3	748	November 29, 2017
Dispatch on DataFrame columns Data	6	774	June 4, 2020

Special case handling - alternatives to if/else?

Related topics