Creating a new dataframe from another based on mapping between variables

I have dataframe where one column contains information about student grades from 1 to 13. Let’s call that GradeA. Another column contains information about how many students got the particular GradeA.

Now consider that another grading system exists that is much less refined than GradeA, having only 5 possible grades. Let’s call that GradeB.

Also consider that GradeB maps perfectly onto GradeA, but in a not so obvious way, because GradeB’s 1, 2 and 3, say, may be GradeB’s 1 and 5 (for example) instead of the more intuitive 1 and 2.

My question is this: what function that inputs a particular grade from GradeB can create a new dataframe that contains all the grades that correspond to that grade on the GradeA scale, along with corresponding number of students?

Example:
Suppose GradeB of 3 corresponds to GradeA of 1,2, and 5 on the GradeA scale, how might I create a dataframe from my original dataframe that only includes the grades 1,2 and 5 without having to write something like:

new_df = old_df[(old_df[!,:GradeA].== 1) .|| (old_df[!,:GradeA].== 2) .|| (old_df[!,:GradeA].== 5),:]

How about using joins?

using DataFrames

dfA = DataFrame(GradeA=1:5, Count=[1,1,2,3,5])
BtoA_map = [1 => [3], 2 => [4], 3 => [1,2,5]]

dfB = DataFrame(GradeB=[], GradeA=[])

for p in BtoA_map
    for i in p.second
        push!(dfB, (p.first, i))
    end
 end

leftjoin(dfB, dfA, on=:GradeA)

gives

5×3 DataFrame
 Row │ GradeB  GradeA  Count
     │ Any     Any     Int64?
─────┼────────────────────────
   1 │ 3       1            1
   2 │ 3       2            1
   3 │ 1       3            2
   4 │ 2       4            3
   5 │ 3       5            5

@jd-foster was faster :slight_smile: Here is my attempt…

df = DataFrame( gradeA = [1,2,3,4,5], gradeA_count = [10,20,30,40,50] )

# define a function for mapping from gradeA to gradeB. Toy example:
function grade_map(a)
  if a in (1,2,5)
    return 1
  elseif a in (3,)
    return 2
  else 
    return 3
  end
end

df.gradeB = grade_map.( df.gradeA )
dfB = combine( groupby( df, :gradeB ), :gradeA_count => sum => :gradeB_count )