I’ve been running into the following question: What is the most “sensible” or idiomatic way to select a function implementation based on a runtime argument (by which I mean, it could potentially be passed e. g. as a command line argument by the user) identifying the implementation to use?
For illustration, I’ll use the example of reading data from different file formats. A function read_file(path, format)
could take arguments path
(giving the file path) and format
, which could be a Symbol
or String
indicating what format the given file is in. How should this choice between different versions of read_file
(one for each possible file format) be implemented?
It seems like a mundane thing, but somehow with Julia, the choice doesn’t necessarily seem obvious to me. It might be because “TIMTOWTDI” (there’s more than one way to do it), whereas for example with Python, approach (2b) (see below) seems like the most common and natural way of doing it. (Although I suppose something like approach (1b) could still be implemented in a similar fashion using OOP/classes. I think this is probably just my distaste for approach (1b) steering me away from it in general.)
Another thing to note is that these functions will generally be expensive to call (e. g. doing file IO), so it doesn’t matter so much if some of the approaches add a little more overhead to the function call than others.
Generally, I see two main ways of implementing the function/method selection: Either
- by using Julia’s dispatch system, i. e. the “file format” information is translated to the type domain, or
- just using normal control flow branches, i. e.
if
/else
.
I came across two versions for approach (1):
Approach (1a)
read_file(path, ::Val{:hdf5}) = ...
read_file(path, ::Val{:binary}) = ...
read_file(path, ::Val{:text}) = ...
read_file(path, format::Symbol) = read_file(path, Val(format))
I’m aware of the performance tips on “values-as-parameters”, but as explained above, some function call overhead won’t really make much of a difference. On the other hand, if doing this somehow has a more serious impact than I think (e. g. by somehow slowing down compilation even in cases where the read_file
function is not involved), that would of course be relevant. Otherwise, syntactically and from an organizational perspective, this looks really clear and neat to me!
Approach (1b)
abstract type FileFormat end
struct hdf5 <: FileFormat end
struct binary <: FileFormat end
struct text <: FileFormat end
get_format_type(format)::FileFormat = ...
read_file(path, ::hdf5) = ...
read_file(path, ::binary) = ...
read_file(path, ::text) = ...
read_file(path, format) = read_file(path, get_format_type(format))
This looks similar to traits, although it doesn’t actually have anything to do with traits since no other types are involved. I would perhaps rather call this a “type enum”, i. e. an enumeration whose elements are separate types. I ran across this in several threads here:
However, it actually doesn’t seem very effective, since it’s still necessary to define a function (here get_format_type
) to actually convert from the runtime value format
to the corresponding type. So really, this just looks like manually re-implementing what Val
does.
For approach (2), I also have two versions (with another two variants), but the differences here are mostly minor.
Approach (2a)
@enum FileFormat begin
hdf5
binary
text
end
read_file_hdf5(path) = ...
read_file_binary(path) = ...
read_file_text(path) = ...
function read_file(path, format::FileFormat)
if format == hdf5
read_file_hdf5(path)
elseif format == binary
read_file_binary(path)
elseif format == text
read_file_text(path)
else error("Unreachable")
end
end
I found this question here on Discourse about dispatching on enum values, where the answers were either to use approach (1a) or (2a) (this one). What I don’t like about this is that I have to use function names to carry the file format information. It seems much cleaner to actually keep the file format argument as an argument, as in approach (1a). Further, this violates “DRY” (don’t repeat yourself) and there is potential for mistakes here, since you manually have to make sure to use the correct function name again in each branch (possible copy-paste errors).
Approach (2b)
read_file_hdf5(path) = ...
read_file_binary(path) = ...
read_file_text(path) = ...
function read_file(path, format::Symbol)
if format == :hdf5
read_file_hdf5(path)
elseif format == :binary
read_file_binary(path)
elseif format == :text
read_file_text(path)
else error("Unknown format")
end
end
This is really the same as (2a), just using Symbol
or String
values instead of defining a separate enum. More terse, but less rigorous since there are now invalid values of format
. On the other hand, I suppose I’m actually glossing over how the user input is converted into an enum value, so in some sense something equivalent to this has to be done somewhere in the program.
Approach (2a’)
@enum FileFormat begin
hdf5
binary
text
end
const read_file_map = (
hdf5 = path -> ...,
binary = path -> ...,
text = path -> ...,
)
read_file(path, format::FileFormat) = read_file_map[format](path)
This solves my main problems with (2a/b), but I still have to define read_file_map
as a separate global object which really only exists for the purposes of the read_file
function. At this point, I feel like I’m just re-implementing the equivalent of what Julia’s dispatch mechanism would automatically do for me in approach (1a) anyway!
Approach (2b’)
const read_file_map = (
hdf5 = path -> ...,
binary = path -> ...,
text = path -> ...,
)
read_file(path, format::Symbol) = read_file_map[format](path)
Again the same as (2a’), but just using Symbol
(like (2b)) instead of an enum (like (2a)).