Multiple dispatch usage with named arguments

Hi,
I am trying to write a function which has multiple arguments. The argument maf could be a single value or a vector of length p.

function simulate_data(; n::Int64 = 1000, p::Int64 = 5, maf = 0.3)

if length(maf) == 1    
        mnaf = fill(maf,p)   # make maf a vector of length p if only one value is specified
    elseif length(maf) == p
        mnaf = maf
    else
        error("maf argument is incorrect")
    end
rand.(Binomial.(1, mnaf), n, 1)
# other code
end

My understanding of multiple dispatch is to create two functions

function simulate_data(n::Int64, p::Int64, maf::Float64)
rand.(Binomial.(1, fill(maf,p) ), n, 1)
# other code
end

function simulate_data(n::Int64, p::Int64, maf::Array{Float64 ,1})
rand.(Binomial.(1, maf), n, 1)
# other code
end

This works. However, is this recommended? This is duplicating a lot of code, just to change a couple of lines of code (Assuming# other code does a lot of stuff), and the purpose of writing functions in the first place was to reduce duplication.

Does multiple dispatch not work with named arguments?

function simulate_data(; n::Int64 = 1000, p::Int64 = 5, maf::Float64 = 0.3)
rand.(Binomial.(1, fill(maf,p) ), n, 1)
# other code
end

function simulate_data(; n::Int64 = 1000, p::Int64 = 5, maf::Array{Float64 ,1} = [0.3,0.3,0.3,0.3,0.3])
rand.(Binomial.(1, maf), n, 1)
# other code
end

In this scenario the function gets overwritten. The first function with maf:: Float64 is not present.

Could someone please help me understand how to use multiple dispatch properly.

Thank You

Only positional arguments participate in multiple dispatch, whereas keyword arguments do not. Their main purpose is to optionally override defaults in the function. A good example of this is to look at the docstring for sort.

sort(A; dims::Integer, alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)

  Sort a multidimensional array A along the given dimension. See sort! for a description of possible keyword
  arguments.

The positional argument A is the part where dispatch will occur, while the optional arguments let you override the default behavior of the sort by passing for example a different comparator.

2 Likes

A typical pattern is to use one method to call another more general one, e.g.

function simulate_data(n::Int64, p::Int64, maf::Array{Float64 ,1})
  rand.(Binomial.(1, maf), n, 1)
  # other code
end
simulate_data(n::Int64, p::Int64, maf::Float64) = 
  simulate_data(n, p, fill(maf,p))

This is where inline function definitions are very useful.

If you want to use keyword arguments for dispatch, you can try my KeywordDispatch.jl package.

3 Likes

I think the best way how to structure your code depends heavily on the context.
This is my advice (all opinions, other people might disagree):

1.) It’s usually a good idea to keep the number of arguments to a function as low as possible. Requiring many (~ more than 2) positional arguments or many named arguments is a sign that your function does more than one thing and can be split up.

For things like simulation runs, having some named arguments like in your case is usually fine, but it might be a good idea to group them into a struct if there are too many. There is also Parameters.jl which might be of interest.

2.) You could define two methods without duplicating your code by having the one with the Vector as the default, and the one with the Float just calling the other one.

3.) There is (usually) nothing wrong with a bit of control flow like in your first case. Multiple dispatch is a useful tool and if you don’t need it, that’s fine. If maf = 0.3 is really just meant as a simple way to express something like maf = [0.3, 0.3, 0.3, 0.3, 0.3], I would just stick with your first example. You can replace the if statements with a ternary operator if you like:

function simulate_data(n, p, maf)
    mnaf = length(maf) == 1 ? fill(maf, p) : maf
    ...
end
1 Like

Thanks for the replies! All the posts had important stuff I didn’t know.

Thank You!:+1: