Summing and counting values in a df by row

I have a dataframe of the form:

4×4 DataFrame
 Row │ ID     Q1     Q2       Q3
     │ Int64  Int64  Int64?   Int64?
─────┼────────────────────────────────
   1 │     1      1        0        1
   2 │     2      1        1        0
   3 │     3      0        0  missing
   4 │     4      0  missing        1

The data are quiz answers, and missing indicates that the person did not attempt the question.

I want to add two columns:

  • :score for the number of correct responses; a* nd,
  • :attempt for the number of questions attempted.

Expected result

4×6 DataFrame
 Row │ ID     Q1     Q2       Q3       score  attempts
     │ Int64  Int64  Int64?   Int64?   Int64  Int64
─────┼────────────────────────────────────────────────
   1 │     1      1        0        1      2        3
   2 │     2      1        1        0      2        3
   3 │     3      0        0  missing      0        2
   4 │     4      0  missing        1      1        2

My best effort so far is:

julia> df1 = transform(df, r"Q" => (+) => :score)

4×5 DataFrame
 Row │ ID     Q1     Q2       Q3       score
     │ Int64  Int64  Int64?   Int64?   Int64?
─────┼─────────────────────────────────────────
   1 │     1      1        0        1        2
   2 │     2      1        1        0        2
   3 │     3      0        0  missing  missing
   4 │     4      0  missing        1  missing

which doesn’t give me totals for rows with missing values. I have tried various permutations of skipmissing without success. Also I can’t quite work out an expression to count the non-missing values in each row.

Thanks in advance.

EDIT:
@pdeffebach provides an efficient solution to the :score problem. I also found a possibly less efficient solution which I can also apply to the :attempts problem.

transform(df, r"Q" => ByRow((x...) -> sum(skipmissing(x))) => :sum)
transform(df1, r"Q" => ByRow((x...) -> count(>(-1),skipmissing(x))) => :attempts)
1 Like

This is a tricky one, and is only really answered by an in depth understanding of the src => fun => dest syntax.

you want

julia> transform(df, AsTable([:a, :b]) => ByRow(sum ∘ skipmissing) => :score)
2×3 DataFrame
 Row │ a      b        score
     │ Int64  Int64?   Int64
─────┼───────────────────────
   1 │     1        3      4
   2 │     2  missing      2

the AsTable combined with ByRow means the function takes a named tuple. skipmissing works just fine on a named tuple.

4 Likes

Thanks again @pdeffebach
It’s clear I have to get my head around the src => fun => dest syntax.

Your solution worked, though I did just find an alternative solution which also worked for me. Not sure if it is as efficient but for me it is more comprehensible.

transform(df, r"Q" => ByRow((x...) -> sum(skipmissing(x))) => :sum)

Another way:

using DataFrames

ID = 1:4
Q1 = [1, 1, 0, 0]
Q2 = [0, 1, 0, missing]
Q3 = [1, 0, missing, 1]
df = DataFrame(ID=ID, Q1=Q1, Q2=Q2, Q3=Q3)

df[:score] = sum(eachcol(coalesce.(df[Not(:ID)],0)))
df[:attempts] = sum((eachcol(.~ismissing.(df[[:Q1,:Q2,:Q3]]))))

julia> df
4×6 DataFrame
│ Row │ ID    │ Q1    │ Q2      │ Q3      │ score │ attempts │
│     │ Int64 │ Int64 │ Int64?  │ Int64?  │ Int64 │ Int64    │
├─────┼───────┼───────┼─────────┼─────────┼───────┼──────────┤
│ 1   │ 1     │ 1     │ 0       │ 1       │ 2     │ 3        │
│ 2   │ 2     │ 1     │ 1       │ 0       │ 2     │ 3        │
│ 3   │ 3     │ 0     │ 0       │ missing │ 0     │ 2        │
│ 4   │ 4     │ 0     │ missing │ 1       │ 1     │ 2        │
3 Likes