Capture local variables to keyword arguments

Hi! I’m working on a machine learning project in multiple steps. Each step may require as input the output of some of the previous steps. I’m doing everything in a functional, immutable style, using keyword arguments to keep tracking of how the result of each step is plugged in to subsequent steps. Wondering if there’s some way to avoid repeatedly assigning keyword arguments, and capturing the local variables with the same name as the keyword argument instead. Is there maybe a macro package for that?

Something like this:

load_data() = ...
clean_data(; data) = ... (do something with data)
calculate_property_xyz(; clean_data) = ... (do some calculations)

which would be called like this:

    local data = load_data()
    local clean_data = clean_data(data=data)
    local property_xyz = calculate_property_xyz(clean_data=clean_data)

This results in code that is too repetitive, so I’d like to have a way to capture the parameter values in some way, let’s say like this:

    local data = load_data()
    local clean_data = @capture clean_data()
    local property_xyz = @capture calculate_property_xyz()

where @capture is some hypothetical macro that assigns keyword arguments from local variables with the same name.

This could be done with positional instead of keyword arguments, which would reduce verbosity in the original code, but still not eliminate it, and also introduce a possibility of error by passing the arguments in the wrong order.

Is there any nice macro package that can do something like this? Or perhaps I’m looking at the problem from the wrong perspective: how would I best code a multi-step process, where outputs of previous steps need to be correctly plugged in as inputs to later steps, avoiding verbosity or the introduction of a mega-struct that contains everything?

How about a pipeline? You can use the |> operator to write your example very compactly:

property_xyz = load_data() |> clean_data |> calculate_property_xyz
1 Like

I might not be understanding but are you aware of piping syntax?

raw_data = [1,2,3,4,5,6,7]

julia> clean_data = raw_data |> d -> filter(!isodd, d) .|> d -> d^2
3-element Array{Int64,1}:

EDIT: Sorry for the repetitive response, @rdeits must have a higher words-per-minute typing rate and beat me by a few seconds :smile:

1 Like

Pipelining is a good idea! However, it wouldn’t work with multiple arguments, let’s say:

load_data() = ...
clean_data(data) = ...
calculate_grid(clean_data) = ...
calculate_property_xyz(clean_data, grid) = ...

which would be called like:

let data = load_data(),
    clean_data = clean_data(data),
    grid = calculate_grid(clean_data),
    property_xyz = calculate_property_xyz(clean_data, grid)

while what would be desirable would be something like this:

let data = load_data(),
    clean_data = @capture clean_data(), # captures 1 value
    grid = @capture calculate_grid(), # captures 1 value
    property_xyz = @capture calculate_property_xyz(), # captures 2 values

where arguments are passed automatically as long as they have the same name. Something like this could reduce code verbosity and decrease the possibility of errors while avoiding defining a mega-struct or mega-class that contains everything and making sure steps are called in the right order (since it would be an error to use a value before it’s defined otherwise).

Is there any package or language feature I’m missing that could do something like that?

Honestly, I would prefer to write your first example (without @capture), and I would definitely prefer to read code in that style. Your example without @capture is perfectly clear–any Julia user in the world can understand what it’s doing. Your hypothetical example with @capture is completely opaque–there’s no indication of how data moves around and no possible way to understand it without looking up an esoteric macro. Are you sure this is actually a problem you need to solve?

Actually, there is an upcoming language feature in 1.5 that might help. In Julia 1.5, you’ll be able to do:

grid = calculate_grid(; clean_data)

which will expand do:

grid = calculate_grid(clean_data = clean_data)

i.e. it will pass the variable as a keyword argument with the same name. If you often find yourself writing calculate_grid(clean_data = clean_data), then this could save you a bit of typing. See for more.


I see, I think you’re right.

That’s great, there’s a lot of argument passing like this in my code. Funny coincidence that the new feature is coming up soon, that’s really going to make the code cleaner.

I think this new feature is a good solution and something I’ll keep an eye on when it’s released soon. Thanks a lot!