I am puzzled by this one, spent a lot of time trying to figure it out.
I have two alternative definitions for a pair of functions. Some code using groupby/combine works with one pair of definitions, not with the other one.
The two pairs of function definitions are
-
These work. Just
sum
andStatistics.std
. -
These do not work:
function my_sum(x::Vector{Float64})
sum(x)
end
function my_std(x::Vector{Float64})
Statistics.std(x)
end
This may seem to have a trivial solution: just use the ones that work. Unfortunately things are not so simple. I want to create functions taking only one argument from other functions that take several arguments. But if I cannot make the code work even when I create the funcions with a single argument directly as above then the code will not (and does not) work when I create them in more complex ways.
So here is the code. First, this works:
letrs = ["A", "B", "C", "A", "B", "C", "A", "A", "A", "B", "B", "B",
"C", "C", "C", "C", "B", "A", "A", "C"];
df = DataFrame(
A = rand(20),
B = collect(1:20.),
C = repeat([0, 1], inner = 10),
D = repeat([1,2,3, 4], inner = 5),
E = letrs,
)
gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
# fns = Dict("Sum" => my_sum, "Std" => my_std)
fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)
And here is what I get:
4Γ4 DataFrame
β Row β C β D β A_sum β A_std β
β β Int64 β Int64 β Float64 β Float64 β
βββββββΌββββββββΌββββββββΌββββββββββΌβββββββββββ€
β 1 β 0 β 1 β 2.5632 β 0.310824 β
β 2 β 0 β 2 β 2.05255 β 0.346137 β
β 3 β 1 β 3 β 3.25201 β 0.312622 β
β 4 β 1 β 4 β 2.75622 β 0.305511 β
So fat so good. Now I substitute the line that is commented out in the code above for the line just below it to get:
gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
fns = Dict("Sum" => my_sum, "Std" => my_std)
# fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)
And here is what I get:
ERROR: MethodError: no method matching my_sum(::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false})
Closest candidates are:
my_sum(::Vector{Float64}) at REPL[38]:1
Stacktrace:
[1] do_call(f::typeof(my_sum), idx::Vector{Int64}, starts::Vector{Int64}, ends::Vector{Int64}, gd::GroupedDataFrame{DataFrame}, incols::Tuple{Vector{Float64}}, i::Int64)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:742
[2] _combine(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:1168
[3] combine_helper(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:589
[4] _combine_prepare(::GroupedDataFrame{DataFrame}, ::Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}, ::Vararg{Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:553
[5] #combine#390
@ C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474 [inlined]
[6] combine(::GroupedDataFrame{DataFrame}, ::Pair{Symbol, typeof(my_sum)}, ::Pair{Symbol, typeof(my_std)})
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474
[7] top-level scope
@ REPL[63]:1
Any insights on this puzzling (at least to me) behavior? And how can one fix this problem?