Creating a new dataframe from another based on mapping between variables

Nash · September 26, 2022, 10:37am

I have dataframe where one column contains information about student grades from 1 to 13. Let’s call that GradeA. Another column contains information about how many students got the particular GradeA.

Now consider that another grading system exists that is much less refined than GradeA, having only 5 possible grades. Let’s call that GradeB.

Also consider that GradeB maps perfectly onto GradeA, but in a not so obvious way, because GradeB’s 1, 2 and 3, say, may be GradeB’s 1 and 5 (for example) instead of the more intuitive 1 and 2.

My question is this: what function that inputs a particular grade from GradeB can create a new dataframe that contains all the grades that correspond to that grade on the GradeA scale, along with corresponding number of students?

Example:
Suppose GradeB of 3 corresponds to GradeA of 1,2, and 5 on the GradeA scale, how might I create a dataframe from my original dataframe that only includes the grades 1,2 and 5 without having to write something like:

new_df = old_df[(old_df[!,:GradeA].== 1) .|| (old_df[!,:GradeA].== 2) .|| (old_df[!,:GradeA].== 5),:]

jd-foster · September 26, 2022, 11:17am

How about using joins?

using DataFrames

dfA = DataFrame(GradeA=1:5, Count=[1,1,2,3,5])
BtoA_map = [1 => [3], 2 => [4], 3 => [1,2,5]]

dfB = DataFrame(GradeB=[], GradeA=[])

for p in BtoA_map
    for i in p.second
        push!(dfB, (p.first, i))
    end
 end

leftjoin(dfB, dfA, on=:GradeA)

gives

5×3 DataFrame
 Row │ GradeB  GradeA  Count
     │ Any     Any     Int64?
─────┼────────────────────────
   1 │ 3       1            1
   2 │ 3       2            1
   3 │ 1       3            2
   4 │ 2       4            3
   5 │ 3       5            5

SteffenPL · September 26, 2022, 11:33am

@jd-foster was faster Here is my attempt…

df = DataFrame( gradeA = [1,2,3,4,5], gradeA_count = [10,20,30,40,50] )

# define a function for mapping from gradeA to gradeB. Toy example:
function grade_map(a)
  if a in (1,2,5)
    return 1
  elseif a in (3,)
    return 2
  else 
    return 3
  end
end

df.gradeB = grade_map.( df.gradeA )
dfB = combine( groupby( df, :gradeB ), :gradeA_count => sum => :gradeB_count )

Topic		Replies	Views
New Dataframe Column from a View from Another DataFrames General Usage data , dataframes	1	385	January 26, 2021
Dataframes difficult joins Data data , dataframes , data_structures	3	593	February 10, 2021
Create new column in dataframe based on list General Usage question , dataframes	5	650	June 4, 2022
Create new dataframe with minor changes New to Julia dataframes , copy	11	417	April 30, 2022
Creating one column from another New to Julia	5	296	September 8, 2022

Creating a new dataframe from another based on mapping between variables

Related topics