I have a function on a DataFrame which should behave differently depending on the names / eltypes of the DataFrame columns. Is this possible using a multiple-dispatch like pattern?
Example:
using DataFrames
function myfunc(df:: AbstractDataFrame) # call this method if columns a + b exist and are numeric
df[!, :c] = df.a .* df.b
return df
end
function myfunc(df:: AbstractDataFrame) # call this method if columns a + c exist and are numeric
df[!, :b] = df.c ./ df.a
return df
end
foo = DataFrame(a=1:4, b=2:5)
bar = DataFrame(a=1:4, c=3:6)
myfunc(foo) # should call 1st definition
myfunc(bar) # should call 2nd definition
It would be possible to check the column names and eltypes inside the function, but I wonder if there is a more elegant / “Julian” way of doing this.
No, this is not possible with dispatch. Unlike NamedTuples, for example, the names of columns in a DataFrame are not stored in the type information of the data frame.
You will need to use if...else constructs to solve this problem.
Yeah, DataFrames specifically doesn’t encode column name/type information as type parameters, due to the desire to avoid having to recompile lots of methods.
getcolumn(df::AbstractDataFrame, name, default) = name in names(df) ? df[!,name] : default
function myfunc(df::AbstractDataFrame)
a,b,c = (getcolumn(df,name,nothing) for name in ("a", "b", "c"))
_myfunc(df,a,b,c)
end
function _myfunc(df::AbstractDataFrame, a::AbstractVector{<:Number}, b::AbstractVector{<:Number}, c::Nothing)
df[!, :c] = a .* b
return df
end
# etc.
This is probably a scenario where you don’t need dispatch. And because myfunc can’t determint the types of a, b, c, you won’t get a performance benefit from this structure compared to more readable if...else code.