Why code stops working after seemingly trivial function redefinition?

I am puzzled by this one, spent a lot of time trying to figure it out.

I have two alternative definitions for a pair of functions. Some code using groupby/combine works with one pair of definitions, not with the other one.

The two pairs of function definitions are

  1. These work. Just sum and Statistics.std.

  2. These do not work:

function my_sum(x::Vector{Float64})
    sum(x)
end

function my_std(x::Vector{Float64})
    Statistics.std(x)
end

This may seem to have a trivial solution: just use the ones that work. Unfortunately things are not so simple. I want to create functions taking only one argument from other functions that take several arguments. But if I cannot make the code work even when I create the funcions with a single argument directly as above then the code will not (and does not) work when I create them in more complex ways.

So here is the code. First, this works:


letrs = ["A", "B", "C", "A", "B", "C", "A", "A", "A", "B", "B", "B",
         "C", "C", "C", "C", "B", "A", "A", "C"];
df = DataFrame(
    A = rand(20),
    B = collect(1:20.),
    C = repeat([0, 1], inner = 10),
    D = repeat([1,2,3, 4], inner = 5),
    E = letrs,
)
gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
# fns = Dict("Sum" => my_sum, "Std" => my_std)
fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)

And here is what I get:

4Γ—4 DataFrame
β”‚ Row β”‚ C     β”‚ D     β”‚ A_sum   β”‚ A_std    β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚ Float64 β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0     β”‚ 1     β”‚ 2.5632  β”‚ 0.310824 β”‚
β”‚ 2   β”‚ 0     β”‚ 2     β”‚ 2.05255 β”‚ 0.346137 β”‚
β”‚ 3   β”‚ 1     β”‚ 3     β”‚ 3.25201 β”‚ 0.312622 β”‚
β”‚ 4   β”‚ 1     β”‚ 4     β”‚ 2.75622 β”‚ 0.305511 β”‚

So fat so good. Now I substitute the line that is commented out in the code above for the line just below it to get:

gdf = groupby(df, [:C, :D])
group_cols = [:C, :E]
fns = Dict("Sum" => my_sum, "Std" => my_std)
# fns = Dict("Sum" => sum, "Std" => Statistics.std)
target_col = :A
ks = [k for k in keys(fns)] # to preserve order, keys(fns) was unordered
vals = [fns[k] for k in ks]
summ = combine(gdf, [target_col => fn for fn in vals]...)

And here is what I get:

ERROR: MethodError: no method matching my_sum(::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false})
Closest candidates are:
  my_sum(::Vector{Float64}) at REPL[38]:1
Stacktrace:
 [1] do_call(f::typeof(my_sum), idx::Vector{Int64}, starts::Vector{Int64}, ends::Vector{Int64}, gd::GroupedDataFrame{DataFrame}, incols::Tuple{Vector{Float64}}, i::Int64)    
   @ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:742
 [2] _combine(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}, copycols::Bool, keeprows::Bool)
   @ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:1168
 [3] combine_helper(f::Vector{Pair}, gd::GroupedDataFrame{DataFrame}, nms::Vector{Symbol}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
   @ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:589
 [4] _combine_prepare(::GroupedDataFrame{DataFrame}, ::Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}, ::Vararg{Union{Colon, typeof(nrow), Regex, AbstractString, Signed, Symbol, Unsigned, AbstractVector, Pair, All, Between, InvertedIndex}}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool)
   @ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:553
 [5] #combine#390
   @ C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474 [inlined]
 [6] combine(::GroupedDataFrame{DataFrame}, ::Pair{Symbol, typeof(my_sum)}, ::Pair{Symbol, typeof(my_std)})
   @ DataFrames C:\Users\fsald\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:474
 [7] top-level scope
   @ REPL[63]:1

Any insights on this puzzling (at least to me) behavior? And how can one fix this problem?

You are being too specific in the signature of my_sum, requiring that the argument be a Vector{Float64}. It ends up being passed a SubArray and there is no matching method.

In most cases you don’t need to be so specific about argument types when you define a method. In fact, it is often an advantage not to assign types to arguments if there is only one method. Or you could give it a more general type like AbstractVector{<:AbstractFloat}

2 Likes

Both of your suggestions worked. One thousand thanks.