Using a (computed) function in DataFrames with multiple arguments

In Julia I can do:

meanratio(x,y) = mean(x) / mean(y)

function calc(fun, vbl) 

julia> x1=rand(10); x2=rand(10);calc(meanratio, (x1, x2))

I want fun to be flexible in term of how many arguments. In my current use case fun gives back a single value, like here the ratio of two means.

Is there a way to do it in with Dataframes, ideally DataframesMacros? Obviously I want to apply this for GroupedDataFrames.

I tried somthing like:

@combine df  {outvar} = fun({vbl}...)

but that did not work.

Is this what you want? f below is flexible in how many positional arguments it takes:

julia> f(x...) = x
f (generic function with 1 method)

julia> f(1)

julia> f(1, 2)
(1, 2)

julia> f(1, 2, 3)
(1, 2, 3)

Thanks - but that’s not what I meant. The point is the fun is passed as parameter. In plain Julia it all works.

E.g. I can define:

crazyRatio(x,y,z) = mean(x) / mean(y) / std(z)


julia> x1=rand(10); x2=rand(10); x3=rand(10); 
calc(crazyRatio, (x1, x2, x3))

would work (did not test now).
So I have no “problem” in Base Julia, but to do it in Dataframes. Maybe it is trivial and I am blind…, but my naive

@combine df  {outvar} = fun({vbl}...)

where vbl is a tuple of Symbols referring to columns.

… did not work.

I think for this you need to use {{ vbl }} because that’s meant to splice a tuple of column values into the expression which you can then splat with ....

I don’t understand:

but then the code @combine df {outvar} = fun({vbl}...) does not pass fun as parameter…

It seems to me that the minilanguage in DataFrames.jl already works the way you want: you can specifiy a variable number of columns and the function will be called with this number of parameters. Is the following not flexible enough?

using DataFrames
using Statistics

crazyRatio(x,y,z) = mean(x) / mean(y) / std(z)

# Just to show that all these things can be passed as parameters
fun = crazyRatio
vbl = [:x1, :x2, :x3]
outvar = :y

df = DataFrame(x1=rand(10), x2=rand(10), x3=rand(10))

combine(df, vbl => fun => outvar)

# Output
1×1 DataFrame
 Row │ y       
     │ Float64 
   1 │ 7.65015
1 Like

Exactly, that is why I thought OP wanted to learn how to define a vararg function.

1 Like

Thanks! … yes indeed - I was thinking too complicated (“new to Julia”). And was trying to do it with DataFramesMacros

This is what I wanted:

function calc_df(df, vbl, gr, fun)

  combine(groupby(df, gr), vbl => fun => string(fun))
   # This DataframeMacros would not work this way
   #@chain df begin
   #  @groupby {gr}
   #  @combine {string(fun)} = fun({{vbl}})

 df = DataFrame(x1=rand(20), x2=rand(20), x3=rand(20), grp = repeat('A':'D'; inner=5))
 calc_df(df, [:x1, :x2, :x3], :grp, crazyRatio)

4×2 DataFrame
 Row │ grp   crazyRatio 
     │ Char  Float64    
   1 │ A        1.55756
   2 │ B        3.45605
   3 │ C        5.95207
   4 │ D        2.23287

Still would be nice if it works with DataFramesMacros (@jules ? ) :slight_smile:

Without testing it, I think it would work with @combine {string(fun)} = fun({{vbl}}...), so you splat the tuple resulting from {{vbl}} into fun. This is nice when you have to pass other parameters as well, for example fun(first_arg, {{vbl}}...) because then you cannot do vbl => fun in the minilanguage anymore, which is otherwise concise as well.

And I still think you don’t need to do @combine {string(fun)} = but just @combine string(fun) = ... because the {} was supposed to mean something else on the left side. But I have to check the implementation again why this even works…

1 Like

Thanks! You are right on both points! I thought I had tried {{vbl}}..., and it did not work . But I does work.

So this is, what I initially intended:

function calc_df(df, vbl, gr, fun)

   @chain df begin
   @groupby {gr}
   @combine string(fun)= fun({{vbl}}...)

crazyRatio(x,y,z) = mean(x) / mean(y) / std(z)

 df = DataFrame(x1=rand(20), x2=rand(20), x3=rand(20), grp = repeat('A':'D'; inner=5))
 calc_df(df, [:x1, :x2, :x3], :grp, crazyRatio)

Ah, now I remember why it did not work: passing a Tuple of symbols did not work, i.e.

 calc_df(df, (:x1, :x2, :x3), :grp, crazyRatio)

I am still lacking the needed intuition I guess…

That’s just because a tuple of symbols is not in the list of types I expect for column identifiers. Does names(df, (:x, :y, :z)) work? If it does, I should add it too, if not, I won’t :slight_smile:

Doesn’t work

ERROR: MethodError: no method matching getindex(::DataFrames.Index, ::Tuple{Symbol, Symbol})

Then that column specification is not supported by DataFrames.jl

1 Like