Transform counts to binary variable in DataFrame

I have a DataFrame containing counts of a binary variable (yes/no). I would like to code the counts as instances of 1 = yes and no = 0.

Here is a simple example of the Dataframe with the same structure as my dataset:

initial DataFrame

 Row │ subject  response  count 
     │ Int64    String    Int64 
─────┼──────────────────────────
   1 │       1  yes           2
   2 │       1  no            0
   3 │       2  yes           2
   4 │       2  no            1

The example below shows the desired result after coding counts of yes to 1 and counts of no to 0

desired DataFrame

 Row │ subject  y     
     │ Int64    Int64 
─────┼────────────────
   1 │       1      1
   2 │       1      1
   3 │       2      1
   4 │       2      1
   5 │       2      0

How can this be accomplished with DataFrames.jl or another package?

Thank you.

MWE

using DataFrames

df1 = DataFrame(
    subject = [1,1,2,2],
    response = ["yes","no","yes","no"],
    count = [2,0,2,1]
)


df2 = DataFrame(
    subject = [1,1,2,2,2],
    y = [1,1,1,1,0],
)

Here’s one way:

df2 = combine(df1, :subject, [:response, :count] =>
                             ByRow((r, c) -> fill(r == "yes", c)) =>
                             :y)

julia> df3 = flatten(df2, :y)
5×2 DataFrame
 Row │ subject  y     
     │ Int64    Bool  
─────┼────────────────
   1 │       1   true
   2 │       1   true
   3 │       2   true
   4 │       2   true
   5 │       2  false

This makes Bool values (which are also a type of integer). If you really want Int64, you can change r == "yes" to Int64(r == "yes") or r == "yes" ? 1 : 0.

1 Like

Very nice! Thank you.

Here is a solution with DataFramesMeta

julia> using DataFramesMeta

julia> df1 = DataFrame(           subject = [1,1,2,2],
           response = ["yes","no","yes","no"],
           count = [2,0,2,1]
       );

julia> @chain df1 begin 
           @rselect begin   
               :subject
               :y = fill(:response == "yes", :count)
           end
           flatten(:y)
       end
5×2 DataFrame
 Row │ subject  y     
     │ Int64    Bool  
─────┼────────────────
   1 │       1   true
   2 │       1   true
   3 │       2   true
   4 │       2   true
   5 │       2  false
1 Like

One more approach, inspired by this post:

v1 = vcat(fill.(df1.subject, df1.count)...)
v2 = vcat(fill.(df1.response, df1.count)...) .== "yes"
df2 = DataFrame(subject = v1, y = Int64.(v2))

5×2 DataFrame
 Row │ subject  y     
     │ Int64    Int64 
─────┼────────────────
   1 │       1      1
   2 │       1      1
   3 │       2      1
   4 │       2      1
   5 │       2      0
1 Like

@rafael.guerra, @pdeffebach, thank you for your solutions. These are all great.

1 Like