# Adding calculated columns to dataframe

Hello Folks,

I have a DF that constructed as follows:

``````DF = DataFrame(ID = 1:10, Col1 = rand(15:0.01:40,10), Col2 = rand(30:0.01:55,10),
Col3 = rand(20:0.01:65,10)...)
``````

I am attempting to do the following:

1. Add columns that calculate Min, Max, IQR, Mean, Median, Variance, Std. across the columns. I know I must use (using Statistics) and apply the methods, but would like to
add the calculated columns to the existing DF.

Something like

``````insertcols!(DF, 4, :Min => min.(DF[:.1:3]))

``````
1. Is there anyway to plot mean values for each row with the Min and Max as upper and lower boundaries on the same plot?

Thank you,

``````julia> df
10Γ3 DataFrame
Row β x1         x2        x3
β Float64    Float64   Float64
ββββββΌβββββββββββββββββββββββββββββββ
1 β 0.63686    0.181631  0.860774
2 β 0.441555   0.811134  0.254945
3 β 0.396693   0.676834  0.20931
4 β 0.96424    0.267536  0.962784
5 β 0.288845   0.822987  0.803151
6 β 0.193365   0.886698  0.972179
7 β 0.992748   0.636317  0.527519
8 β 0.427211   0.473838  0.847242
9 β 0.0215964  0.742059  0.32778
10 β 0.396853   0.201746  0.506313

julia> transform!(df, AsTable(:) => ByRow(maximum))
10Γ4 DataFrame
Row β x1         x2        x3        x1_x2_x3_maximum
β Float64    Float64   Float64   Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.63686    0.181631  0.860774          0.860774
2 β 0.441555   0.811134  0.254945          0.811134
3 β 0.396693   0.676834  0.20931           0.676834
4 β 0.96424    0.267536  0.962784          0.96424
5 β 0.288845   0.822987  0.803151          0.822987
6 β 0.193365   0.886698  0.972179          0.972179
7 β 0.992748   0.636317  0.527519          0.992748
8 β 0.427211   0.473838  0.847242          0.847242
9 β 0.0215964  0.742059  0.32778           0.742059
10 β 0.396853   0.201746  0.506313          0.506313
``````

But I want to stress that, this is anti-columnar so you might want to use DataFrameMetas.jl to speed things up. Basically, if you need everything on each row, youβre doing row-based analysis.

Thank you.

Am wondering β if I use DataFrameMeta.jl do I need to qualify anything in the line you provided or the system automatically references the package that will execute optimally?

no, you need to re-write it using things from DataFramesMeta

Okay will explore a little, working with small set β so performance should not be major issue with the instruction you provided.

Quick follow-up,

Do you know how I could round the calculated column in your example?

I am testing:

``````transform!(UnitsPer2, AsTable(:) => round.(ByRow(std), digits=3) => :StdDev)
``````

and attempted:

``````round.(transform!(UnitsPer2, AsTable(:) => ByRow(std) => :StdDev, digits =3))

``````

Any suggestions? Is all I need to complete this exploration.

the stuff inside `ByRow` is a function that will be passed the entire row as the argument. Iβm guessing you need

``````ByRow(row -> round(std(row); digits=3))
``````

This is currently `AsTable(:) => ByRow(maximum)` is currently not something you can do in DataFramesMeta.jl.

Iβm advising to do a for loop to fill columns and put those columns back to the original dataframe. the for-loop part meta can help.

1 Like

To your earlier point,

The ByRow functions seems to work on all row
elements, instead of the column subset of interest.
I suppose I could fill AsTable(2:4), to specify where
to perform the row-wise operation, yes?

1 Like

Applying your method, for rounding as

``````transform!(DF, AsTable(2:4) => ByRow(row -> round.(var(row); digits =3)) => :Var)
``````

Works well~

Yes β I now see. The formulations I put together are:

``````transform!(DF, AsTable(2:4) => ByRow(minimum) => :Min)
transform!(DF, AsTable(2:4) => ByRow(maximum) => :Max)
transform!(DF, AsTable(2:4) =>
ByRow(row -> round.(var(row); digits =3)) => :Var)
transform!(DF, AsTable(2:4) =>
ByRow(row -> round.(std(row); digits =3)) => :StdDev)
``````

Perhaps you were thinking of something more streamlined?

You can simplify it like this:

``````using DataFrames
using Statistics

DF = DataFrame(ID = 1:10, Col1 = rand(15:0.01:40,10),
Col2 = rand(30:0.01:55,10),
Col3 = rand(20:0.01:65,10))

round3(x) = round(x, digits=3)

transform(DF, AsTable(2:4) .=>
ByRow.([minimum, maximum, round3βvar, round3βstd]) .=>
[:Min, :Max, :Var, :StdDev])

# Output
10Γ8 DataFrame
Row β ID     Col1     Col2     Col3     Min      Max      Var      StdDev
β Int64  Float64  Float64  Float64  Float64  Float64  Float64  Float64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β     1    16.83    40.6     41.94    16.83    41.94  199.553   14.126
2 β     2    39.26    51.72    42.11    39.26    51.72   42.621    6.528
3 β     3    16.8     46.58    36.69    16.8     46.58  230.045   15.167
4 β     4    31.01    48.33    46.44    31.01    48.33   90.273    9.501
5 β     5    35.92    41.29    39.67    35.92    41.29    7.587    2.755
6 β     6    23.63    54.91    29.67    23.63    54.91  275.33    16.593
7 β     7    38.69    38.05    47.62    38.05    47.62   28.623    5.35
8 β     8    32.24    38.45    30.99    30.99    38.45   15.963    3.995
9 β     9    36.48    35.7     24.4     24.4     36.48   45.704    6.76
10 β    10    15.17    49.39    35.62    15.17    49.39  296.471   17.218
``````
1 Like

Thank you for sharing this, in my example I wanted to make
a deep copy over-write so I included (!). Also, wanted to show
a second :StDev column but *2. I attempted:

``````round3(x) = round(x, digits=3)
transform!(DF, AsTable(2:4) .=>
ByRow.([minimum, maximum, mean, round3βvar,
round3βstd ]) .=>
[:Min, :Max, :Var, :Mean, :StdDev, :StdDev2])
transform!(DF, AsTable(2:4) =>
ByRow(row -> (round.(std(row); digits =3)) .*2)
=> :StdDev2)
``````

Any tips?

I think the broadcast function would be better here, as our colleague has expressed. For loops from what I read are not good design features for larger code, though we are working with small tables here.

the opposite, when your data doesnβt fit in RAM, for-loop sometimes is the only reasonable choice that is also flexible enough (i.e. not `.this().that().compute()`

Thank you.

Could you point me to a resource?

Try to understand how broadcasting works in this code. For that itβs super helpful to execute the code piece by piece.

``````AsTable(2:4) .=>
ByRow.([minimum, maximum, round3βvar, round3βstd]) .=>
[:Min, :Max, :Var, :StdDev]
``````

The first line has a scalar and the second line makes a vector of 4 βByRowβ functions:

``````julia> ByRow.([minimum, maximum, round3βvar, round3βstd])
4-element Vector{ByRow}:
(::ByRow{typeof(minimum)}) (generic function with 2 methods)
(::ByRow{typeof(maximum)}) (generic function with 2 methods)
(::ByRow{ComposedFunction{typeof(round3), typeof(var)}}) (generic function with 2 methods)
(::ByRow{ComposedFunction{typeof(round3), typeof(std)}}) (generic function with 2 methods)
``````

and the third line is a vector of 4 names. So by broadcasting `=>` we are simply creating a vector of 4 βpairs of pairsβ. See what we get if we execute the whole expression given as parameter to `transform!`:

``````julia> AsTable(2:4) .=>
ByRow.([minimum, maximum, round3βvar, round3βstd]) .=>
[:Min, :Max, :Var, :StdDev]

4-element Vector{Pair{AsTable}}:
AsTable(2:4) => (ByRow{typeof(minimum)}(minimum) => :Min)
AsTable(2:4) => (ByRow{typeof(maximum)}(maximum) => :Max)
AsTable(2:4) => (ByRow{ComposedFunction{typeof(round3), typeof(var)}}(round3 β Statistics.var) => :Var)
AsTable(2:4) => (ByRow{ComposedFunction{typeof(round3), typeof(std)}}(round3 β Statistics.std) => :StdDev)
``````

So to make it work just make sure the βByRowβ vector and the names vector have the same size, and the order of βByRowβ functions corresponds to the names (in your code you have the `mean` function before `var` but the names in the reverse order):

``````transform!(DF, AsTable(2:4) .=>
ByRow.([minimum, maximum, mean, round3βvar, round3βstd, round3β(x->2x)βstd]) .=>
[:Min, :Max, :Mean, :Var, :StdDev, :StdDev2])
``````
2 Likes

Thank you very much for your thorough explanation.

Prior to reviewing it, I was able to catch the symbol
reversal and made the correction.

However, what you said provided needed context.

Best,

1 Like