I’d like a macro @chaineach such that:
@chaineach x begin
#commands
end
is equivalent to
map( y -> @chain y begin
#commands
end, collect(x) )
Example usage is a @chain block processing a DataFrame and creating a GroupedDataFrame
I’d then like to apply a @chain block to each subdataframe.
What about this:
macro chaineach(x, ex)
y = gensym()
quote
map($y -> @chain($y, $ex), collect($x))
end |> esc
end
Example:
julia> using Chain
julia> r = 1:3; x = 2; @chaineach r begin _ + x; - end
3-element Vector{Int64}:
-3
-4
-5
The macro assumes that @chain is in scope.
using DataFrames, Chain, TidierData
macro chaineach(x, ex)
y = gensym()
quote
map($y -> @chain($y, $ex), collect($x))
end |> esc
end
@chain begin
DataFrame( a=[1,1,2,2], b=1:4, c=11:14 )
@group_by a
@chaineach _ begin
sum(_.b) + sum(_.c)
end
end
ERROR: FieldError: type GroupedDataFrame has no field `b`, available fields: `parent`, `cols`, `groups`, `idx`, `starts`, `ends`, `ngroups`, `keymap`, `lazy_lock`
I’m not sure if you can expect nested @chain / @chaineach macros to work. What do @macroexpand and @macroexpand1 give?
EDIT: This works
julia> @chaineach [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
but this doesn’t:
julia> @chain [1:2, 3:4] begin @chaineach _ begin 2*_ end end
ERROR: MethodError: no method matching *(::StepRangeLen{Int64, Int64, Int64, Int64}, ::Vector{UnitRange{Int64}})
Could it be that @chain recognizes itself when scanning an expression? It cannot recognize @chaineach.
I think the @chain macro is coded specifically with nesting ability
I guess the same would have to be added specifically for @chaineach
Idea: replace @chaineach by @map @chain. This way an outer @chain can see the inner one and act accordingly.
macro map(ex)
y = gensym()
x, ex.args[3] = ex.args[3], y
quote
map($y -> $ex, collect($x))
end |> esc
end
Nesting seems to work:
julia> @map @chain [1:2, 3:4] begin 2*_ end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @map @chain [1:2, 3:4] begin @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{StepRangeLen{Int64, Int64, Int64, Int64}}:
2:2:4
6:2:8
julia> @map @chain [1:2, 3:4] begin @map @chain _ begin 2*_ end end
2-element Vector{Vector{Int64}}:
[2, 4]
[6, 8]
Basically you want an easier way to write
map(x) do xi
@chain xi begin
...
end
end
?
FWIW, that’s how the @Lincoln_Hannah’s example from above would look with DataPipes.jl:
@p let
StructArray( a=[1,1,2,2], b=1:4, c=11:14 )
group(_.a)
map() do __
sum(__.b) + sum(__.c)
end
end
No new macros at all, and quite intuitive behavior: __ always means the result of the previous pipeline step, and doing map() do __ effectively assigns to it – starting the inner pipeline with this value.
@aplavin, using DataFrames.jl, I believe the following is equivalent to your code:
using DataPipes, DataFrames
@p let
DataFrame(a=[1,1,2,2], b=1:4, c=11:14)
groupby(__, :a)
combine() do __
sum(__.b) + sum(__.c)
end
end
One wrinkle here is that map is not defined for grouped data frames, so this particular example wouldn’t quite work. But overall I think this macro solves your problem
julia> macro chaineach(iterable, chainarg)
map_arg = gensym()
chainblock = Expr(:macrocall, Symbol("@chain"), 1, map_arg, chainarg)
out = quote
map($iterable) do $map_arg
$chainblock
end
end
return esc(out)
end;
julia> x = [1, 2, 3];
julia> @chaineach x begin
_ + 1
end
3-element Vector{Int64}:
2
3
4
julia> df = DataFrame(g = [1, 1, 2, 2], y = [1, 2, 10, 20]);
julia> gd = groupby(df, :g);
julia> gd_vec = [gdi for gdi in gd];
julia> @chaineach gd_vec begin
@with begin
sum(:y)
end
end
2-element Vector{Int64}:
3
30
Isn’t it essentially identical to the macro in my first response (modulo the collect that OP wanted to have)?
Ah you are correct. And your map solution is pretty good.
I love this syntax. I’m converting tabular historical data to a KeyedArray of vol surfaces.
using DataFrames, Chain, TidierData, AxisKeys
@chain begin
# Many lines to get date
@select histDate volatility money tenor
@arrange histDate days money
@group_by histDate
@aside histDate = first.(keys(_))
@map @chain _ begin
wrapdims( :volatility, :money, :tenor )
extend_surface()
Vol_Surface{linear_variance}()
end
KeyedArray( histDate )
end
@pdeffebach your solution of a single macro @chaineach is great too. I assume the the collect() function that @matthias314 added could be included so it could work directly on a GroupedDataFrame. Either way its a very clean syntax. Minimal brackets, minimal dummy variables.