Efficient way to add a value of column to all rows conditioned on another column by group

Rahul · November 19, 2021, 3:00pm

My data has 3 columns (id, time, conc). I am looking for a concise way to add the column (using Chain.jl and DataFramesMeta.jl) as described in the title. Example code is below:

df = DataFrame(id = sort(repeat([1, 2, 3], 10)),
                time = repeat(0:10:90, 3),
                conc = rand(30))

@chain df begin
    @aside concs_20 = @chain _ begin
        @rsubset :time == 20
        @select :id :conc_20hr = :conc
    end
    leftjoin(_, concs_20, on=[:id])
end

For example, in R, I can do the task concisely (using the library data.table) as follows:

df = data.table(id = sort(rep(c(1, 2, 3), 10)),
               time = rep(seq(0,90,10), 3),
               conc = rnorm(30))
df[, ":="(conc_20hr = conc[time == 20]), by=.(id)]

pdeffebach · November 19, 2021, 3:22pm

This is an area of active development in DataFramesMeta, and I wish there were a better way of doing it.

The way to do the conditional transformation is

julia> @chain df begin 
           @rtransform :conc_20hr = :time == 20 ? :conc : missing
       end

but it looks like you also want to “spread” the result within the group

julia> @chain df begin 
           groupby(:id)
           @transform :conc_20hr = first(:conc[:time .== 20])
       end

bkamins · November 19, 2021, 9:53pm

The last example matches what OP has written in data.table. I would just make a small substitution of first to only as it is safer (you are sure there is only one match for :time .== 20)

DataFrames · November 20, 2021, 12:49am

Actually your leftjoin solution might be more efficient (if you don’t (want to) change the transform part):

df = DataFrame(id = sort(repeat(1:300000, 10)),
                  time = repeat(0:10:90, 300000),
                  conc = rand(3000000))

 @btime transform(groupby(df,1), [:time, :conc]=> (x,y)->first(y[x .== 20]))
  75.705 ms (900817 allocations: 302.45 MiB)

 @btime leftjoin(df, df[df.time .== 20, [1,3]], on = [:id], makeunique = true)
  59.999 ms (345 allocations: 175.26 MiB)

Topic		Replies	Views
Combining elements from multiple rows by conditionals into columns with DataFramesMeta General Usage dataframes , dataframesmeta	10	352	July 5, 2023
Add colum to dataframe with values based on various conditions General Usage dataframes , dataframesmeta	5	874	August 31, 2022
Create a new column that has a list of items of a second column, with a condition from a third column New to Julia dataframes	1	243	June 22, 2022
Easiest way to do "replace col1 = col2 if col3" in a dataframe Data	3	439	December 16, 2020
Group Selection by Condition in DataFramesMeta.jl Data dataframes , dataframesmeta	1	359	April 21, 2022

Efficient way to add a value of column to all rows conditioned on another column by group

Related topics