I’d like a macro @chaineach such that:
@chaineach x begin
#commands
end
is equivalent to
map( y -> @chain y begin
#commands
end, collect(x) )
Example usage is a @chain block processing a DataFrame and creating a GroupedDataFrame
I’d then like to apply a @chain block to each subdataframe.
What about this:
macro chaineach(x, ex)
y = gensym()
quote
map($y -> @chain($y, $ex), collect($x))
end |> esc
end
Example:
julia> using Chain
julia> r = 1:3; x = 2; @chaineach r begin _ + x; - end
3-element Vector{Int64}:
-3
-4
-5
The macro assumes that @chain is in scope.
using DataFrames, Chain, TidierData
macro chaineach(x, ex)
y = gensym()
quote
map($y -> @chain($y, $ex), collect($x))
end |> esc
end
@chain begin
DataFrame( a=[1,1,2,2], b=1:4, c=11:14 )
@group_by a
@chaineach _ begin
sum(_.b) + sum(_.c)
end
end
ERROR: FieldError: type GroupedDataFrame has no field `b`, available fields: `parent`, `cols`, `groups`, `idx`, `starts`, `ends`, `ngroups`, `keymap`, `lazy_lock`
I’m not sure if you can expect nested @chain / @chaineach macros to work. What do @macroexpand and @macroexpand1 give?
EDIT: This works
julia> @chaineach [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
but this doesn’t:
julia> @chain [1:2, 3:4] begin @chaineach _ begin 2*_ end end
ERROR: MethodError: no method matching *(::StepRangeLen{Int64, Int64, Int64, Int64}, ::Vector{UnitRange{Int64}})
Could it be that @chain recognizes itself when scanning an expression? It cannot recognize @chaineach.
1 Like
I think the @chain macro is coded specifically with nesting ability
I guess the same would have to be added specifically for @chaineach
1 Like
Idea: replace @chaineach by @map @chain. This way an outer @chain can see the inner one and act accordingly.
macro map(ex)
y = gensym()
x, ex.args[3] = ex.args[3], y
quote
map($y -> $ex, collect($x))
end |> esc
end
Nesting seems to work:
julia> @map @chain [1:2, 3:4] begin 2*_ end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @map @chain [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @map @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{Vector{Int64}}:
[2, 4]
[6, 8]
1 Like
Basically you want an easier way to write
map(x) do xi
@chain xi begin
...
end
end
?
FWIW, that’s how the @Lincoln_Hannah’s example from above would look with DataPipes.jl:
@p let
StructArray( a=[1,1,2,2], b=1:4, c=11:14 )
group(_.a)
map() do __
sum(__.b) + sum(__.c)
end
end
No new macros at all, and quite intuitive behavior: __ always means the result of the previous pipeline step, and doing map() do __ effectively assigns to it – starting the inner pipeline with this value.
2 Likes
@aplavin, using DataFrames.jl, I believe the following is equivalent to your code:
using DataPipes, DataFrames
@p let
DataFrame(a=[1,1,2,2], b=1:4, c=11:14)
groupby(__, :a)
combine() do __
sum(__.b) + sum(__.c)
end
end
1 Like
One wrinkle here is that map is not defined for grouped data frames, so this particular example wouldn’t quite work. But overall I think this macro solves your problem
julia> macro chaineach(iterable, chainarg)
map_arg = gensym()
chainblock = Expr(:macrocall, Symbol("@chain"), 1, map_arg, chainarg)
out = quote
map($iterable) do $map_arg
$chainblock
end
end
return esc(out)
end;
julia> x = [1, 2, 3];
julia> @chaineach x begin
_ + 1
end
3-element Vector{Int64}:
2
3
4
julia> df = DataFrame(g = [1, 1, 2, 2], y = [1, 2, 10, 20]);
julia> gd = groupby(df, :g);
julia> gd_vec = [gdi for gdi in gd];
julia> @chaineach gd_vec begin
@with begin
sum(:y)
end
end
2-element Vector{Int64}:
3
30
1 Like
Isn’t it essentially identical to the macro in my first response (modulo the collect that OP wanted to have)?
Ah you are correct. And your map solution is pretty good.
I love this syntax. I’m converting tabular historical data to a KeyedArray of vol surfaces.
using DataFrames, Chain, TidierData, AxisKeys
@chain begin
# Many lines to get date
@select histDate volatility money tenor
@arrange histDate days money
@group_by histDate
@aside histDate = first.(keys(_))
@map @chain _ begin
wrapdims( :volatility, :money, :tenor )
extend_surface()
Vol_Surface{linear_variance}()
end
KeyedArray( histDate )
end
@pdeffebach your solution of a single macro @chaineach is great too. I assume the the collect() function that @matthias314 added could be included so it could work directly on a GroupedDataFrame. Either way its a very clean syntax. Minimal brackets, minimal dummy variables.
2 Likes