Dispatch on DataFrame columns

Hi,

I have a function on a DataFrame which should behave differently depending on the names / eltypes of the DataFrame columns. Is this possible using a multiple-dispatch like pattern?

Example:

using DataFrames
function myfunc(df:: AbstractDataFrame) # call this method if columns a + b exist and are numeric
    df[!, :c] = df.a .* df.b
    return df
end
function myfunc(df:: AbstractDataFrame) # call this method if columns a + c exist and are numeric
    df[!, :b] = df.c ./ df.a
    return df
end

foo = DataFrame(a=1:4, b=2:5)
bar = DataFrame(a=1:4, c=3:6)
myfunc(foo) # should call 1st definition
myfunc(bar) # should call 2nd definition

It would be possible to check the column names and eltypes inside the function, but I wonder if there is a more elegant / “Julian” way of doing this.

No, this is not possible with dispatch. Unlike NamedTuples, for example, the names of columns in a DataFrame are not stored in the type information of the data frame.

You will need to use if...else constructs to solve this problem.

2 Likes

Yeah, DataFrames specifically doesn’t encode column name/type information as type parameters, due to the desire to avoid having to recompile lots of methods.

For “typed” tables, take a look at IndexedTables.jl, TypedTables.jl, or just using a “columntable” (a NamedTuple of vectors).

2 Likes

If statement should suffice. It’s gonna be fast.

1 Like

Maybe like this?

getcolumn(df::AbstractDataFrame, name, default) = name in names(df) ? df[!,name] : default
function myfunc(df::AbstractDataFrame)
    a,b,c = (getcolumn(df,name,nothing) for name in ("a", "b", "c"))
    _myfunc(df,a,b,c)
end
function _myfunc(df::AbstractDataFrame, a::AbstractVector{<:Number}, b::AbstractVector{<:Number}, c::Nothing)
    df[!, :c] = a .* b
    return df
end
# etc.
1 Like

This is probably a scenario where you don’t need dispatch. And because myfunc can’t determint the types of a, b, c, you won’t get a performance benefit from this structure compared to more readable if...else code.

Thanks for your responses!
I will stick to the “classical” if - else pattern. Just wanted to make sure that I don’t miss something.