@chaineach

I’d like a macro @chaineach such that:

@chaineach x begin
    #commands
end

is equivalent to

map( y -> @chain y begin
    #commands
end,  collect(x) )

Example usage is a @chain block processing a DataFrame and creating a GroupedDataFrame
I’d then like to apply a @chain block to each subdataframe.

What about this:

macro chaineach(x, ex)
    y = gensym()
    quote
        map($y -> @chain($y, $ex), collect($x))
    end |> esc
end

Example:

julia> using Chain

julia> r = 1:3; x = 2; @chaineach r begin _ + x; - end
3-element Vector{Int64}:
 -3
 -4
 -5

The macro assumes that @chain is in scope.

using DataFrames, Chain, TidierData


macro chaineach(x, ex)
    y = gensym()
    quote
        map($y -> @chain($y, $ex), collect($x))
    end |> esc
end


@chain begin 

    DataFrame( a=[1,1,2,2],  b=1:4, c=11:14 )

    @group_by a

    @chaineach _ begin
        
        sum(_.b) + sum(_.c)

    end

end 


ERROR: FieldError: type GroupedDataFrame has no field `b`, available fields: `parent`, `cols`, `groups`, `idx`, `starts`, `ends`, `ngroups`, `keymap`, `lazy_lock`

I’m not sure if you can expect nested @chain / @chaineach macros to work. What do @macroexpand and @macroexpand1 give?

EDIT: This works

julia> @chaineach [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
 2:2:4
 6:2:8

but this doesn’t:

julia> @chain [1:2, 3:4] begin @chaineach _ begin 2*_ end end
ERROR: MethodError: no method matching *(::StepRangeLen{Int64, Int64, Int64, Int64}, ::Vector{UnitRange{Int64}})

Could it be that @chain recognizes itself when scanning an expression? It cannot recognize @chaineach.

1 Like

I think the @chain macro is coded specifically with nesting ability
I guess the same would have to be added specifically for @chaineach

1 Like

Idea: replace @chaineach by @map @chain. This way an outer @chain can see the inner one and act accordingly.

macro map(ex)
    y = gensym()
    x, ex.args[3] = ex.args[3], y
    quote
        map($y -> $ex, collect($x))
    end |> esc
end

Nesting seems to work:

julia> @map @chain [1:2, 3:4] begin 2*_ end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
 2:2:4
 6:2:8

julia> @map @chain [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
 2:2:4
 6:2:8

julia> @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
 2:2:4
 6:2:8

julia> @map @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{Vector{Int64}}:
 [2, 4]
 [6, 8]
1 Like

Basically you want an easier way to write

map(x) do xi
    @chain xi begin 
        ...
    end
end

?

FWIW, that’s how the @Lincoln_Hannah’s example from above would look with DataPipes.jl:

       @p let
           StructArray( a=[1,1,2,2],  b=1:4, c=11:14 )
           group(_.a)
           map() do  __
               sum(__.b) + sum(__.c)
           end
       end

No new macros at all, and quite intuitive behavior: __ always means the result of the previous pipeline step, and doing map() do __ effectively assigns to it – starting the inner pipeline with this value.

2 Likes

@aplavin, using DataFrames.jl, I believe the following is equivalent to your code:

using DataPipes, DataFrames

@p let
    DataFrame(a=[1,1,2,2], b=1:4, c=11:14)
    groupby(__, :a)
    combine() do  __
        sum(__.b) + sum(__.c)
    end
end
1 Like

One wrinkle here is that map is not defined for grouped data frames, so this particular example wouldn’t quite work. But overall I think this macro solves your problem

julia> macro chaineach(iterable, chainarg)
           map_arg = gensym()
           chainblock = Expr(:macrocall, Symbol("@chain"), 1, map_arg, chainarg)
           out = quote 
               map($iterable) do $map_arg
                   $chainblock
               end
           end
           return esc(out)
       end;

julia> x = [1, 2, 3];

julia> @chaineach x begin
           _ + 1
       end
3-element Vector{Int64}:
 2
 3
 4

julia> df = DataFrame(g = [1, 1, 2, 2], y = [1, 2, 10, 20]);

julia> gd = groupby(df, :g);

julia> gd_vec = [gdi for gdi in gd];

julia> @chaineach gd_vec begin 
           @with begin 
               sum(:y)
           end
       end
2-element Vector{Int64}:
  3
 30
1 Like

Isn’t it essentially identical to the macro in my first response (modulo the collect that OP wanted to have)?

Ah you are correct. And your map solution is pretty good.

I love this syntax. I’m converting tabular historical data to a KeyedArray of vol surfaces.

using DataFrames, Chain, TidierData, AxisKeys

@chain begin

    # Many lines to get date
    @select   histDate volatility money tenor
    @arrange  histDate days money
    @group_by histDate

    @aside histDate = first.(keys(_))
    
    @map @chain _ begin

        wrapdims( :volatility,  :money,  :tenor  )
        extend_surface()
        Vol_Surface{linear_variance}()

    end 

    KeyedArray( histDate )

end

@pdeffebach your solution of a single macro @chaineach is great too. I assume the the collect() function that @matthias314 added could be included so it could work directly on a GroupedDataFrame. Either way its a very clean syntax. Minimal brackets, minimal dummy variables.

2 Likes