Default argument that's not executed unless needed?

I’m trying to write a script where one of the arguments can be a list of things, a file path, or have a default. I’m using ArgParse.jl, and when a user passes eg --parents <stuff...> to the commandline, I get an array stored in args["parents"] (with nothing in it if the --parents argument is used).

Then inside my script, I do

    if length(args["parents"]) == 0
        parents = unique(longdf.parent_table)
    elseif length(args["parents"]) == 1 && isfile(args["parents"][1])
        parents = readlines(args["parents"][1])
    else
        parents = args["parents"]
    end

Where longdf is a dataframe. I’ve got a couple of things like this that have different defaults, so of course I’d like to write a function, something like

function file_or_list_arg(arg, default)
    if length(arg) == 0
        return default
    elseif length(arg) == 1 && isfile(arg[1])
        return readlines(arg[1])
    else
        return arg
    end
end

And then do

parents = file_or_list_arg(args["parents"], unique(longdf.parent_table))

Inside the script. This works, but if I understand correctly, the default gets calculated whether or not it’s needed. It’s not such a huge burden that it makes sense to jump through a bunch of hoops, but it got me wondering if there’s a straightforward pattern to accomplish the same thing while only executing the default argument if it’s needed. Here’s a MWE (I think):

julia> using BenchmarkTools, Statistics, Random

julia> Random.seed!(1)

julia> stuff = rand(10_000);

julia> my_func(x, default) = x[1] < 0.5 ? x[1] : default
my_func (generic function with 1 method)

julia> stuff[1]
0.23603334566204692

julia> @benchmark my_func($stuff, mean($stuff)) # is there a way to prevent mean() from running if not needed?
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     778.192 ns (0.00% GC)
  median time:      793.269 ns (0.00% GC)
  mean time:        795.902 ns (0.00% GC)
  maximum time:     1.741 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     104

julia> m = mean(stuff);

julia> @benchmark my_func($stuff, $m)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.475 ns (0.00% GC)
  median time:      1.481 ns (0.00% GC)
  mean time:        1.483 ns (0.00% GC)
  maximum time:     4.916 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The only way I could think of doing this would be to pass a function (such as mean in your example) into your main function (my_func in your example) so that the main function is free to decide when it wants to call whatever function was passed to it as an argument. For example

my_func(f, x, default) = x[1] < 0.5 ? x[1] : f(default)
2 Likes

Neat! I like this idea - and I can make a generic version that just returns the default value, in case it’s something easier to calculate or already calculated:

function file_or_list_arg(arg, default, f=x->x)
    if length(arg) == 0
        return f(default)
    elseif length(arg) == 1 && isfile(arg[1])
        return readlines(arg[1])
    else
        return arg
    end
end
1 Like