Multiple dispatch Union

I have a question and am not even sure how to phrase it correctly. In the end I am wondering if my (toy) implementation can be improved / made more idiomatic to julia. So this is a pure learning question.

Problem
I have a function that takes two parameters as an input both of which are of type String. This function is where “all the magic happens”. If either one or both of the two parameters are of type Array{String,1} I want to call the above function for each element of the Array(s).

The below code works as intended. However, I am unsure if there is a much more idiomatic “julia” way to do this:

function to_dataframe(foo::String,bar::String) 
   DataFrame(foo=foo, bar=bar)  # Here all the magic happens in my real problem
end;

function to_dataframe(foo::Union{String, Array{String,1}}, bar::Union{String, Array{String,1}})
    foo = isa(foo, String) ? [foo] : foo
    bar = isa(bar, String) ? [bar] : bar
    
    df = DataFrame(foo=String[], bar=String[])
    for a in foo
        for b in bar
            df = vcat(df, to_dataframe(a,b))
        end
    end
    df
end;

Thanks for any suggestions!

Sincerely, concerning the ‘toy’ example, I think the most Julian thing would be:

function to_dataframe(foo, bar)
    return DataFrame(foo = foo, bar = bar)
end

And ignore all the rest. Yet better, do not even define this function just call DataFrame directly.

The user already has a simple way to mix scalars and lists:

to_dataframe([string_a], [string_a])
to_dataframe([string_a], array_b)
to_dataframe(array_a, [string_b])
to_dataframe(array_a, array_b)

Why complicate the things?

1 Like

It only makes sense to have a Union as type parameter if the two types have the same behavior. That’s why you see Union{AbstractString, Symbol} so much, cause you can sometimes write code that works the same for both.

Strings and vectors are different types with different behaviors entirely. Better use multiple dispatch instead.

You are partly right because I chose a not so good toy example. Although to_dataframe(array_a, array_b) actually yields different results than call DataFrame directly in case array sizes differ.

I understand that. My follow up question is then:

Should I write four functions?

  • function to_dataframe(foo::String, bar::String)
  • function to_dataframe(foo::Array{String,1}}, bar::String)
  • function to_dataframe(foo::String, bar::Array{String,1})
  • function to_dataframe(foo::Array{String,1}, bar::Array{String,1})

Yes that sounds good. Do all the work in the Array-Array version and the other methods just convert.

But this could be an XY problem. Maybe you should re-configure upstream to make sure your inputs are consistent?

1 Like