# Why code stops working after seemingly trivial function redefinition?

I am puzzled by this one, spent a lot of time trying to figure it out.

I have two alternative definitions for a pair of functions. Some code using groupby/combine works with one pair of definitions, not with the other one.

The two pairs of function definitions are

1. These work. Just `sum` and `Statistics.std`.

2. These do not work:

``````function my_sum(x::Vector{Float64})
sum(x)
end

function my_std(x::Vector{Float64})
Statistics.std(x)
end
``````

This may seem to have a trivial solution: just use the ones that work. Unfortunately things are not so simple. I want to create functions taking only one argument from other functions that take several arguments. But if I cannot make the code work even when I create the funcions with a single argument directly as above then the code will not (and does not) work when I create them in more complex ways.

So here is the code. First, this works:

``````
letrs = ["A", "B", "C", "A", "B", "C", "A", "A", "A", "B", "B", "B",
"C", "C", "C", "C", "B", "A", "A", "C"];
df = DataFrame(
A = rand(20),
B = collect(1:20.),
C = repeat([0, 1], inner = 10),
D = repeat([1,2,3, 4], inner = 5),
E = letrs,
)
gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
# fns = Dict("Sum" => my_sum, "Std" => my_std)
fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)
``````

And here is what I get:

``````4×4 DataFrame
│ Row │ C     │ D     │ A_sum   │ A_std    │
│     │ Int64 │ Int64 │ Float64 │ Float64  │
├─────┼───────┼───────┼─────────┼──────────┤
│ 1   │ 0     │ 1     │ 2.5632  │ 0.310824 │
│ 2   │ 0     │ 2     │ 2.05255 │ 0.346137 │
│ 3   │ 1     │ 3     │ 3.25201 │ 0.312622 │
│ 4   │ 1     │ 4     │ 2.75622 │ 0.305511 │
``````

So fat so good. Now I substitute the line that is commented out in the code above for the line just below it to get:

``````gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
fns = Dict("Sum" => my_sum, "Std" => my_std)
# fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)
``````

And here is what I get:

``````ERROR: MethodError: no method matching my_sum(::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false})
Closest candidates are:
my_sum(::Vector{Float64}) at REPL:1
Stacktrace:
 do_call(f::typeof(my_sum), idx::Vector{Int64}, starts::Vector{Int64}, ends::Vector{Int64}, gd::GroupedDataFrame{DataFrame}, incols::Tuple{Vector{Float64}}, i::Int64)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:742
 _combine(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:1168
 combine_helper(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:589
 _combine_prepare(::GroupedDataFrame{DataFrame}, ::Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}, ::Vararg{Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:553
 #combine#390
@ C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474 [inlined]
 combine(::GroupedDataFrame{DataFrame}, ::Pair{Symbol, typeof(my_sum)}, ::Pair{Symbol, typeof(my_std)})
@ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474
 top-level scope
@ REPL:1
``````

Any insights on this puzzling (at least to me) behavior? And how can one fix this problem?

You are being too specific in the signature of `my_sum`, requiring that the argument be a `Vector{Float64}`. It ends up being passed a `SubArray` and there is no matching method.

In most cases you don’t need to be so specific about argument types when you define a method. In fact, it is often an advantage not to assign types to arguments if there is only one method. Or you could give it a more general type like `AbstractVector{<:AbstractFloat}`

2 Likes

Both of your suggestions worked. One thousand thanks.