Prefixing output columns which are returned as Dataframe

Sorry, I am stuck a bit…: Suppose I have a function that takes Vectors and returns a DataFrame of the same length as the Vectors (e.g. my calc function below).

function calc(x1,x2) 
    y1 = accumulate(+, x1+x2 .+ 0.1rand())
    y2 = accumulate(*, y1)
    y3 = accumulate(+, y2 ./ x1)
    DataFrame(;y1, y2, y3)
end

Then I can use => AsTable to get the names of the calculated DataFrame as new columns names. How can I programmatically add a suffix, to distinguish repeated calculations. I came up with

d = DataFrame(x1=rand(100), x2=rand(100))

#transform(d, [:x1, :x2] => ((x,y) -> calc(x,y)) => AsTable) # Standard way
#transform(d, [:x1, :x2] => ((x,y) -> calc(x,y)) => ["y1","y2","y3"] .* "_mycalc") # manually
transform(d, [:x1, :x2] => ((x,y) -> calc(x,y)) => names(calc(rand(2), rand(2))) .* "_mycalc") # The result I want
  1. This is rather a workaround (dummy-calling the function)… Is there a better way to access the names?
  2. I don’t manage to do this at all with DataFrameMacros which I do like very much.

Thanks for any help!

1 Like

Currently the output column names are computed dynamically only if you use AsTable as target. In other cases they are computed statically (i.e. before the transformation function is called). So currently the only way to do it would be to define calc differently. Eg. like this:

julia> function calc2(suffix)
           return function(x1, x2)
               y1 = accumulate(+, x1+x2 .+ 0.1rand())
               y2 = accumulate(*, y1)
               y3 = accumulate(+, y2 ./ x1)
               return DataFrame(Any[y1, y2, y3], string.("y", 1:3, suffix))
           end
       end
calc2 (generic function with 1 method)

julia> transform(d, [:x1, :x2] => calc2("_mycalc") => AsTable)
100Γ—5 DataFrame
 Row β”‚ x1          x2          y1_mycalc  y2_mycalc        y3_mycalc
     β”‚ Float64     Float64     Float64    Float64          Float64
─────┼─────────────────────────────────────────────────────────────────────
   1 β”‚ 0.0344154   0.435705      0.54877      0.54877         15.9455
   2 β”‚ 0.35864     0.116174      1.10223      0.604873        17.6321
   3 β”‚ 0.945627    0.16657       2.29308      1.38702         19.0988
   4 β”‚ 0.489643    0.145826      3.0072       4.17106         27.6174
3 Likes

Thanks for the explanation and hint!
Probably then I like more doing like the following, not having to the change the original function:

addSuffix(df::DataFrame, s) = rename(df, names(df) .* s)
transform(d, [:x1, :x2] => ((x,y) -> addSuffix(calc(x,y), "_mycalc" )) => AsTable) # Standard way
#or
transform(d, [:x1, :x2] => ((x,y) -> @chain calc(x,y) addSuffix("_mycalc")) => AsTable) # with @chain

Still would be happy to hear about DataFrameMacros solution… @jules :slight_smile:

Would it be an option for calc() to return a named tuple instead?
Then the code could be simplified:

function calc(x1,x2) 
    y1 = accumulate(+, x1+x2 .+ 0.1rand())
    y2 = accumulate(*, y1)
    y3 = accumulate(+, y2 ./ x1)
    return (y1_calc=y1, y2_calc=y2, y3_calc=y3)
end

d = DataFrame(x1=rand(100), x2=rand(100))

hcat(d, DataFrame(calc(d.x1,d.x2)))

Thanks, but that would not programmatically allow to add the suffix. Otherwise, a good β€œdesign” question. I thought I wanted to stay within the well-defined DataFrame API framework. But I am happy about suggestions.
The use case is running dynamic models with several states forced by variables in a DataFrame.

Julia is the mother API, but with the mini-language perhaps you could then do instead:

transform(d, [:x1, :x2] => ((x1,x2) -> calc(x1, x2)) => AsTable)

Which is what β€œwe” do above… But your solution does not allow the programmatic addition of the suffix. That’s why @bkamins and I had solutions with suffix as function parameter.

The base operation is

@transform(df, AsTable = @bycol calc(:x1, :x2))

but how AsTable works here cannot be modified due to the aforementioned limitations in the dispatch structure of the mini language. This is not something where DataFrameMacros could add convenience on top.

1 Like

Another option to do it with base DataFrames, and without adding a suffix parameter to calc:

julia> suffixer(s) = df -> rename(n -> n * s, df)
suffixer (generic function with 1 method)

julia> transform(df, [:x1, :x2] => suffixer("_suf") ∘ calc => AsTable)
100Γ—5 DataFrame
 Row β”‚ x1          x2         y1_suf     y2_suf           y3_suf
     β”‚ Float64     Float64    Float64    Float64          Float64
─────┼────────────────────────────────────────────────────────────────────
   1 β”‚ 0.708036    0.975863     1.76416      1.76416          2.49162
   2 β”‚ 0.684251    0.0320959    2.56076      4.51759          9.09386
   3 β”‚ 0.334323    0.49034      3.46568     15.6565          55.9244
   4 β”‚ 0.0376339   0.699219     4.28279     67.0537        1837.66
   5 β”‚ 0.85722     0.0254397    5.24571    351.744         2247.99
   6 β”‚ 0.650973    0.426887     6.40383   2252.51          5708.21
   7 β”‚ 0.432189    0.721066     7.63734  17203.2          45513.0
   8 β”‚ 0.323039    0.221529     8.26217      1.42136e5        4.85509e5

This could be improved so it works not just with dataframes as intermediate results, but also namedtuples, e.g.

1 Like

Based on a master’s solution here, I have adapted the code to handle it:

using DataFrames

function calc(x1,x2, suffix)
  y1 = accumulate(+, x1+x2 .+ 0.1rand())
  y2 = accumulate(*, y1)
  y3 = accumulate(+, y2 ./ x1)
  mynames = Symbol.((:y1, :y2, :y3), "_$suffix")
  return (;zip(mynames, (y1,y2,y3))...)
end

d = DataFrame(x1=rand(100), x2=rand(100))

hcat(d, DataFrame(calc(d.x1, d.x2, "mycalc")))
1 Like